secure compiler seminar 9/12 survey on design of secure low-level languages
Post on 17-Jan-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
Secure Compiler Seminar 9/12
Survey on Design ofSecure Low-Level Languages
Toshihiro YOSHINO, Yonezawa Lab.<tossy-2@yl.is.s.u-tokyo.ac.jp>
A Secure Low-Level Language is Needed
As the secure compiler target language
“secure” means it has a method to prove program’s properties Memory safe, control flow safe, … For this, its concrete formal model
should be given It should also be
low-level To reduce complexity of JIT compiler
(In other words, TCB)
High-LevelProgram
MachineExecutable
SecureLow-LevelLanguage
Secure Compiler
JIT compilation
Verifier
Two major approaches
1. TAL, PCC Extension to conventional assembly languages Utilizes certain logic (such as type theories) to prove safety
2. Virtual Machines Introduce intermediate languages of their own Many of them adopt safe-by-construction design
e.g. Java VM is semantically safe in memory operation Java VM, Microsoft CIL, mvm [Franz et al. 2003],
Jinja [Klein et al. 2006], ADL [Yoshino 2006], …
Existing Researches
Comparison: PCC vs. Java VM
PCC (Extended) Machine code + Proof Generated code and proof are machine-depe
ndent Requires one VC implementation for each archit
ecture
Java VM Verifier ensures type and control flow safety It often restricts optimizations
Leads to performance degradation High cost to perform verification
Stack is nothing more than a set of untyped (variable-number) registers
Machine Independent
Machine Dependent
Limitations ofJava Bytecode Verification
Initialization check is incomplete Correlation with other variables is not taken into account Example:
class Test { int test(boolean b) { int i; try { if(b) return 1; i = 2; } finally { if(b) i = 3; } return i; }}
Incomplete common subexpression elimination Cannot eliminate c.s. in address calculations (array refs)
Then What Should We Do?
Maximize machine-independent part Avoids porting cost of the system Tradeoff against the size of TCB (Trusted Comp
uting Base) But recent works in PCC and TAL (e.g. Foundational
PCC) aim solely to minimize TCB
Reduce proof size and generation cost PCC requires much effort to produce proof, bec
ause the target’s level is very low Registers and memory are untyped, etc.
mvm [Franz et al. 2003]
Aimed to find the semantics level that: Is effective at supporting proof-carrying code Can also be translated efficiently
into highly performing native code(on many platforms)
Separated design between VMlayer and PCC layer
[1] M. Franz et al. A Portable Virtual Machine Target for Proof-Carrying Code. IVME ’03.
mvm: Virtual Machine Design
Register-based architecture The number of registers is not bounded Registers are categorized by the type of values:
Integer, Boolean, Pointer, Address Pointer registers are used to store pointers to heap
objects (more specifically, array heads) Address registers are for storing results of address
arithmetic• Bounds check is not performed in arithmetic, so it has to
be checked in higher layer
Heap can be used to store objects Heap model is explained next
mvm: Virtual Machine Design
mvm Virtual Machine Heap
1
42
…
false
true
… … …
Integer Boolean
label1: instr instr …label2: …
Pointer Address
mvm: Heap Model
mvm heap consists of arrays of objects
Object representation in mvm Each object is tagged
Tag can only be written with new operation and is immutable after creation
Two sections of data area: values and pointers Integers, booleans are stored into the first section Pointers are stored into the second section
1 42
mvm: Heap Model
A type is associated with its tag value, layout and structure This association is managed by compiler Layout describes the sizes of data sections Structure describes the possible substructure inside
pointer section
Example of type information
<> means a tuple
{} means disjunction
datatype T = Int of int | Pair of int * int
… T list …
mvm: Heap Model
Example of a T list object tree
T list
T
int*int
mvm: Instructions
Arithmetics, Logical calculation Similar to many other languages ;-)
Branch Unconditional: goto label Conditional: brtrue bi, label / brfalse bi, label
Condition must be taken from a boolean register Jump is allowed only to a label
Conditional by object tag (RTTI): iftag
mvm: Instructions
Object creation and access pj = new(tag, ik)
Creates an array of ik objects with type tag
r = load([sizev, sizep] | tag, pk, offset)
store([sizev, sizep] | tag, pk, offset, r) sizes and tag are used to check memory safety
Pointer registers and address registers Object access also permit address registers ak
This distinction is for supporting garbage collection Address registers always contain “derived” pointers
mvm: Instructions
Accessing arrays an = adda([sizev, sizep] | tag, pk, il)
Calculates address of the il-th element in an array of type tag stored at pk
in = getlen(pk)
Guards Bounds checking: CHECKLEN(pk, il)
Validity checking: CHECKNOTNULL(pk)
Type checking: CHECKTAG(pi, [sizev, sizep] | tag) These guards are inserted when static checking failed
An Example mvm Program
Type Safety in mvm Programs
Operations on primitives are all type-safe Because registers to store values are distinct
Type-safety proofs are needed only for non-primitive operations Pointers, arrays and records For every pointer operation, check that result
pointer: Points to the beginning of an array, record or value Points to an object of the correct type
Jinja [Klein et al. 2006]
A Java-like programming language built on Isabelle/HOL Formal description of Jinja language, Jinja VM a
nd compiler are given
Several properties were machine-checked Big step evaluation and small step evaluation (at
omic operations) are equivalent Compiler correctness
[2] G. Klein, T. Nipkow. A Machine-Checked Model for a Java-Like Language, Virtual Machine, and Compiler. TOPLAS 28(4), 2006.
Jinja Language
“Jinja is not Java”
Object-oriented language with exceptions A program is a set of class definitions and,
a class consists of several fields and methods Method body is an expression
Overriding is supported as in Java But not overloading, because it is complicated
Language is statically typed Type system ensures that the execution of a well-typed
program never gets stuck
Jinja Language: Language Elements
Values Boolean Bool b, Integer Intg i, Reference Addr a Null reference Null, Dummy value Unit
Expressions Val v , binary operations e1 op e2 , Var V , V := e , e1; e2 , … Conditional: if (e) e1 else e2 , while (e) e’ / Block: {V:T, e} Object construction: new C Casting: Cast C e Field access: e.F{D} , e.F{D} := e
D is annotation added in preprocessing (e.g., by typechecker) Method call: e.M(e, e, …) Exception: throw e , try e1 catch(C V) e2
Big step semantics Typical operational semantics
State = <Heap, Local Variables> Detail abbreviated because nothing special
Small step semantics Finer-grained semantics
One-step evaluation Useful for formalizing parallelism (?)
Each (small) operation is considered atomic Not discussed in the paper
Jinja Language: Semantics
Jinja Language: Semantics
Big and small semantics are proven to be equivalent
wwf-J-prog means “weak well-formedness”, which is defined by the following properties:
Number of parameter types and of parameter names are equal
“this” is not included in parameter list Free variables in the method body only refer to this o
r these parameters
Jinja VM
Similar to Java VM Stack-based machine with heap
State = <addr option, heap, frame list> First element is possibly a generated exception Third element is a call-stack
Frame =<stack, registers, cname, mname, pc>
where stack = value list, registers = value list Evaluation of operands are done on stack Registers are for storing local variables
Jinja VM: Instructions
Basic operations Push v / Pop Register operations: Load n / Store n Arithmetics: IAdd, … Logical operations: CmpEq, …
Object manipulation Construction: New cname Casting: Checkcast cname Field access:
Getfield vname cname / Putfield vname cname
Method invocation: Invoke mname n
Jinja VM: Instructions
Control flow operation Branching: Goto n / IfFalse n
n is relative offset from the instruction Exit from a method: Return
Exception Throwing an exception: Throw Information about exception handlers (try-catch)
are attached to method declarations Handler is retrieved from there when needed
Jinja VM: Semantics
Please refer to the paper for detail Basically, straightforward and intuitive
In this level, there are no runtime checks For example, IAdd (Integer addition) does not c
heck whether its argument is really integers Otherwise, the result is unspecified
This kind of checks is performed by a bytecode verifier
Jinja VM Bytecode Verification
JVM relies on the following assumptions: Types are correct No overflow or underflow in stack Code containment Register initialization before use
Just the same as Java VM
Bytecode verifier statically ensures these assumptions
Jinja VM Bytecode Verification
Abstract interpretation Instead of values, consider only types
Intg 1Addr 1
Addr 1
Addr 3
0
State
IntClass A
Class A
Class B
Int
State Type
Addr 2
Addr 1
Addr 3
0Class B
Class A
Class B
Int
JVM Program
Jinja Compiler and its Correctness
2-staged compilation1. Map parameter names to register indices
Assign local variables to registers Gather variable occurrences and use it to lookup
2. Code generation expression → instruction list (compE2)
• Straightforward definition
Exception table generation (compEx2)
• Separated from compE2, because exception table must contain global addresses
Jinja Compiler and its Correctness
Correctness of compilation
If a program is weakly well-formed, then:
Jinjaprogram
JVMbytecode
Heap,Vars
Heap,Vars
Heap,[Frame]
Heap,[]
compilation
Implementation of Jinja
http://afp.sourceforge.net/entries/Jinja.shtml
About 20kLoC in Isabelle/HOL Over 1,000 theorems are defined It takes about 25 min. to process these proofs o
n a 3GHz Pentium 4 machine with 1GB RAM
Summary of Today’s Talk
We would need a secure low-level language for the target of a secure compiler Minimize machine-dependent part to reduce implementation cost Also reduce cost for proof generation
To answer this, surveyed two VM projects mvm
Aimed to find the “sweet spot” that reconciles high performance and small type-safety proofs
Jinja Constructed a unified formal model of a Java-like language, the und
erlying VM and compiler In contrast to mvm, this research is oriented toward higher-level lan
guages and compiler’s properties
How about ADL [Yoshino 2006] …?
The position of ADL is close to mvm To provide a common basis of implementing ver
ifier for low-level languages Assumed translation direction is opposite
mvm is an intermediate code of compilation ADL is designed to simulate real machines
MachineCodeSecure L3
JVM, mvm
ADL
How about ADL [Yoshino 2006] …?
ADL takes minimalist approach Only 7 kinds of commands Instead, expression-based design to allow complex
formulae to be easily written
ADL can be used as an intermediate language? Probably some modification needed
Register allocation is done, but except for variables Minimalist design, however, may increase complexity in
constructing a verification logic• Abstract interpretation is often not sufficient, so a verification
logic may want to calculate exact values
More References
LLVM Project [Lattner 2000]
http://www.llvm.org/ Use VM for interprocedural optimization
SafeTSA [Amme et al. 2001]
SSA-based language for mobile code security
Dis virtual machine [Winterbottom et al. 1997]
Omniware system [Adl-Tabatabai et al. 1996]
(Typical) Compiler Construction andSeveral Intermediate Languages
Lexing /Parsing
TypeChecking
Normalize(SSA, etc.)
OptimizeIntermediate Code
Generation
RegisterAllocation
Target CodeGeneration
PrettyPrinting
High-LevelProgram
MachineExecutable
Java, CIL,
Jinja(VM)TAL,PCC
mvm
ADL
LLVM,SafeTSA
top related