assume/guarantee reasoning using abstract interpretation

Post on 11-Jan-2016

53 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Assume/Guarantee Reasoning using Abstract Interpretation. Nurit Dor Tom Reps Greta Yorsh Mooly Sagiv. Limitations of Whole Program Analysis. Complexity of Chaotic Iterations Not all the source code is available Large libraries Software components No interaction with the client - PowerPoint PPT Presentation

TRANSCRIPT

Assume/Guarantee Reasoningusing Abstract Interpretation

Nurit DorTom Reps

Greta YorshMooly Sagiv

Limitations of Whole Program Analysis

• Complexity of Chaotic Iterations

• Not all the source code is available– Large libraries– Software components

• No interaction with the client– Program design

A Motivating ExampleList rev(List x) {

if (x ==null) return null ;

return append(rev(xnext), x);

}

List append(List x, List y) {

List e;

if (x == null) return y;

e = malloc(…);

edata = xdata;

enext = append(xnext, y);

}

List rev(List x)

requires acyclic(x)

ensures $$=reverse(x)

List append(List x, List y)

requires acyclic(x) acyclic(y)

ensures $$= x || y

Con

tractC

ontract

Can also used for runtime testing

Challenges in A/G Reasoning

• Specifying procedure contracts

• Performing abstract interpretation using contracts

Specifying Contracts

• Executable specifications– assert– Can use loops– Expressive– Natural– But what about side-effects

• Declarative specifications– Types– First order logic– Z

• Hybrid– Larch– Java Modeling Language

Procedure Contracts and Modularity

• The postcondition does not reveal the whole story

void foo(List x, List z) {

List y, t ;

y = rev(x);

t = rev(z);

}

List rev(List x)

requires acyclic(x)

ensures $$=reverse(x)

List foo(List x)

requires acyclic(x) acyclic(y)

ensures true

Procedure Contracts and Modularity

• Specify parts of the state which may be modified

• But difficult to define potential side-effects• Can use abstract interpretation

void foo(List x, List z) {

List y, t ;

y = rev(x);

t = rev(z)

}

List rev(List x)

requires acyclic(x)

ensures $$=reverse(x)

List foo(List x)

requires acyclic(x) acyclic(y)

ensures true

Issues in Specifying Contracts

• Expressible

• Conciseness

• Natural

• Reuse

• Cost of dynamic check (model checking)

• Decidability

• Cost of abstract interpretation

Plan

• CSSV: A tool for verifying absence of buffer overruns (N. Dor)

• An algorithm for performing abstract interpretation in the most precise way using specification

CSSV: Towards a Realistic Tool for Statically Detecting All

Buffer Overflows in CNurit Dor, Michael Rodeh, Mooly Sagiv

DAEDALUS project

/* from web2c [strpascal.c] */

void foo(char *s)

{

while ( *s != ‘ ‘ )

s++;

*s = 0;

}

Vulnerabilities of C programs

Null dereferenceDereference to unallocated storage

Out of bound pointer arithmetic

Out of bound update

Is it common?

• General belief – yes!• FUZZ study

– Test reliability by random input

– Tens of applications on 9 different UNIX systems

– 18% – 23% hang or crash

• CERT advisory– Up to 50% of attacks are due to buffer overflow

COMMON AND DANGEROUS

CSSV’s Goals

• Efficient conservative static checking algorithm– Verify the absence of buffer overflow

• not just finding bugs

– All C constructs• Pointer arithmetic, casting, dynamic memory, …

– Real programs– Minimum false alarms

Verifying Absence of Buffer Overflow is non-trivial

void safe_cat(char *dst, int size, char *src )

{ if ( size > strlen(src) + strlen(dst) ) {

dst = dst + strlen(dst);

strcpy(dst, src); }}

{string(src) alloc(dst) > len(src)}

{string(src) string(dst) alloc(dst+len(dst)) > len(src)}

string(src) string(dst) (size > len(src)+len(dst)) alloc(dst+len(dst)) > len(src))

Can this be done for real programs?

• Complex linear relationships• Pointer arithmetic• Loops• Procedures

• Use Polyhedra[CH78]• Pointer analysis• Widening• Procedure contracts

Very few false alarms!

Linear Relation Analysis

Cousot and Halbwachs, 78 Statically analyze program variable relations:

a1* var1 + a2* var2 + … + an* varn b Polyhedron

y 1 x + y 3-x + y 1

0 1 2 3 x

0

1

2

3

y

V = { (1,2) (2,1) }R = { (1,0) (1,1) }

C String Static Verifier

• Detects string violations– Buffer overflow (update beyond bounds)– Unsafe pointer arithmetic– References beyond null termination– Unsafe library calls

• Handles full C– Multi-level pointers, pointer arithmetic, structures, casting, …

• Applied to real programs– Public domain software– C code from Airbus

Plan

• Semantics for C program

• Contract language

• Static analysis algorithm

• Implementation

Standard C Semantics

void safe_cat( char *dst, int size, char *src )

{ if ( size > strlen(src) + strlen(dst) ) {

dst = dst + strlen(dst);strcpy(dst, src);

}}

src 0x480588

dst 0x480580

size 0x480584

0x5058510

125

‘x’0x5050510

0x5050518 0

‘y’0x6000009

0x6000A00 0

0x6000009

Instrumented C Semantics

src 0x480588

dst 0x480580

size 0x480584

0x5058510

125

‘x’0x5050510

0x5050518 0

‘y’0x6000009

0x6000A00 0

4

130

base asize

4

4

245

0x6000009

Instrumented C Semantics

src 0x480588

dst 0x480580

size 0x480584

0x5058510

125

‘x’0x5050510

0x5050518 0

‘y’0x6000009

0x6000A00 0

4

130

base asize

4

4

245

0x6000009

0

offset

9

0x6000000

The instrumented semantics checks validity of C expressions ANSI C Cleanness

dst = dst + i

Safety

offset(dst) + i asize(base(dst))

dst

offset(dst)

base(dst)

asize(base(dst))i

Contracts

• Defined in the instrumented semantics• Specify string behavior of procedures (C

expressions)– Precondition

– Postcondition• Use of values at procedure entry

– Side-effects• Can be approximated from pointed information

• No need to specify pointer information– Not aiming for modular pointer analysis

Contracts’ Advantages

• Modular analysis – Use contracts on call statements– Not all the code is available– Enable more expensive analyses

• User control of the verification– Detect errors at point of logical error– Improve the precision of the analysis

• Check additional properties– Beyond ANSI-C

Example

char* strcpy(char* dst, char* src)

requires

mod

ensures

( string(src) alloc(dst) > len(src))

( len(dst) = = [len(src)]pre

return = = [dst]pre

)

dst

safe_cat’s contract

void safe_cat(char* dst, int size, char* src)

requires

mod

ensures

( string(src) string(dst) alloc(dst) == size)

( len(dst) <= [len(src)]pre +

[len(dst)]pre len(dst) >= [len(dst)]pre

)

dst

Contracts and Soundness

• All errors are detected– Violation of statement’s precondition

• …a[i]…

– Violation of procedure’s precondition• Call

– Violation of procedure's postcondition• Return

• Violation messages depend on the contracts• But may lead to more false alarms (e.g., trivial

contracts)

CSSV Static Analysis

1. Inline contracts• Expose behavior of called procedures

2. Pointer analysis (global)• Find relationship between base addresses

3. Integer analysis• Compute offset information

Step 1: Inliner

void safe_cat( char *dst, int size, char *src )

{ …

strcpy(dst, src); …}

void safe_cat( char *dst, int size, char *src )

requires ( string(src) string(dst) alloc(dst) == size)mod dstensures ( len(dst) = =

[pre@len(src)]pre + [len(dst)]pre )

char* strcpy( char *dst, char *src )requires ( string(src) alloc(dst) > len(src))mod dst

ensures ( len(dst) = = [len(src)]pre return = = [dst]pre

)

Step 1: Inliner

void safe_cat( char *dst, int size, char *src )

{ …

strcpy(dst, src); …}

void safe_cat( char *dst, int size, char *src )

requires ( string(src) string(dst) alloc(dst) == size)mod dstensures ( len(dst) = =

[pre@len(src)]pre + [len(dst)]pre )

char* strcpy( char *dst, char *src )requires ( string(src) alloc(dst) > len(src))mod dst

ensures ( len(dst) = = [len(src)]pre return = = [dst]pre

)

assume

assert

Step 1: Inliner

void safe_cat( char *dst, int size, char *src )

{ …

strcpy(dst, src); …}

void safe_cat( char *dst, int size, char *src )

requires ( string(src) string(dst) alloc(dst) == size)mod dstensures ( len(dst) = =

[pre@len(src)]pre + [len(dst)]pre )

char* strcpy( char *dst, char *src )requires ( string(src) alloc(dst) > len(src))mod dst

ensures ( len(dst) = = [len(src)]pre return = = [dst]pre

)

assume

assert

Step 2: Compute Pointer Information

• Required for reasoning about pointers• Every base address is abstracted by an abstract

location• Relationships between base addresses is computed

(points-to)• Global analysis

– Scalable– Imprecise

• Flow insensitive• (Almost) Context insensitive

Global Points-To

main() {char s[10], t[20],r;char *p1, *p2; …p1= r + i;safe_cat(s,10,p1);p2 = r + j;safe_cat(t,10,p2);…

}

s t r

p2

dst src

safe_cat( char *dst, int size, char *src )

{ … strcpy(dst, src); …}

p1

Procedural Points-to (PPT)

• “Project” pointer information on visible variables of the procedure

• Introduce abstract locations for formal parameters• Allow destructive updates through formal

parameters (well behaved programs)• Can decrease precision in some procedures

PPT

Param #1

Param # 2

dst src

safe_cat( char *dst, int size, char *src )

{ … strcpy(dst, src); …}

Step 3: Static Analysis

• Prove linear inequalities on string indices • Abstract string properties using constraint

variables• Use abstract interpretation to conservatively

interpret program statements• Verify safety preconditions

Back to Semantics

src 0x480588

dst 0x480580

size 0x480584

0x5058510

125

‘x’0x5050510

0x5050518 0

‘y’0x6000009

0x6000A00 0

4

130

base asize

4

4

245

0x6000009

0

offset

9

0x6000000

Abstract Representation

src

dst

size

n1

n2

Base address relationship

src 0x480588

dst 0x480580

size 0x480584

0x5058510

125

‘x’0x5050510

0x5050518 0

‘y’0x6000009

0x6000A00 0

0x6000009

0x6000000

Constraint Variables

• For every abstract location

a.offset

src.offset = 9

src

Constraint Variables

• For every integer abstract location

a.val

size.val = 125

size

Constraint Variables

• For every abstract location

a.is_nullt

a.len

a.asize

n1

n1.lenn1.asize

0

Abstract Representation

src

dst

size

n1

n2

dst.offset < n1.len

size.val+ dst.offset = n1.asize

n1.is_nullt = true

n2.is_nullt = true

What does it represent?

dstsize

?

?

n1.is_nullt = true

0

?dst.offset < n1.len

n 1.len

dst.o

ffse

t

size.val + dst.offset = n1.asize

size

.val

n 1.asi

ze

Abstract Interpretation

dst.offset < n1.len

size.val = n1.asize - dst.offset

dst = dst + strlen(dst);

dst.offset = n1.len

size.val = n1.asize - dst.offset + n1.len

Verify Safety Condition

dst = dst + i

dst

offset(dst)

base(dst)

asize(base(dst))i

offset(dst) + i asize(base(dst))

concrete semantics abstract semantics

dst.offset + i.val n1.asize

n1

dst.offsetn1.asize

dst

i

The Assume-Operation

• Use two copies of constraint variables

• Set modified values to ⊤• Meet the post

CSSV Implementation

Cfiles

PreModPost

Cfiles

cont

ract

s

Procedure name

Pointer Analysis

Procedure’sPointer infoInliner

Cfiles

C’files

C2IP

Integer Procedure

Potential Error Messages

Integer Analysis

Used Software

• ASToolKit [Microsoft]

• Core C [TAU - Greta Yorsh]

• GOLF [Microsoft - Manuvir Das]

• New Polka [Inria - Bertrand Jeannet]

Applications

• Verified string library from Airbus with 6 false alarms– Could be avoided by analyzing correlated conditions

• Found 8 real errors in another string intensive application with 2 false alarms– In one case safety depends on correctness– Could be avoided by defensive programming

• 1 - 206 CPU seconds per procedure– No optimizations

• Very few false alarms

Related Work

Non-Conservative

• Wagner et. al. [NDSS’00]

• LCLint’s extension [USENIX’01]

• Eau Claire [IEEE Oakland 02]

Conservative

• Polyspace verifier

• Dor, Rodeh and Sagiv [SAS’01]

Further work

• Derive contracts

• Improve efficiency

• Interprocedural

CSSV: Summary

• Semantics– Safety checking

– Full C

– Enables abstractions

• Contract language– String behavior

– Omit pointer aliasing

• Procedural points-to – Scalable

– Improve precision

• Static analysis – Tracks important string

properties

– Utilizes integer analysis

Foundation of A/G abstract interpretation

Greta Yorsh

www.cs.tau.ac.il/~gretay

Assume-Guarantee Reasoning using AI

T bar();

void foo() {

T p;...

p = bar();

...

}

{prebar, postbar}

{prefoo, postfoo}

assume[prefoo];

assert[prebar];-----------assume[postbar];

assert[postfoo];

Is (a) ?

assert[](a)assume[](a)

<⊤>

<a1>

<a2>

<a3>

<a4>( (a) ⋂ ) a ⋂ ( )

Goals

• Generic algorithms for assert & assume

• Effective

• Efficient

• Allow natural specifications

• Rather precise verification

Motivation

• New approach to using symbolic techniques in abstract interpretation – for shape analysis– for other analyses

• What does it mean to harness a decision procedure for use in static analysis?– what are the requirements ?– what does it buy us ?

What are the requirements ?

Formulas

S ∈ (a) ⇔ S (a) ^

AbstractConcrete

a

Is (a) empty? Is (a) satisfiable?^⇔

(a)

[x0, y0, z0]

[x0, y1, z0]

[x0, y2, z0]

[x0, y, z0]

AbstractConcrete Formulas

(x=0)(z=0)

S ⊧ (a) ⇔ S ∈(a)^

FormulasConcreteValues

AbstractValues

u1

xu

x

...x

v1,v2 : nodeu1(v1) nodeu (v2) v1 ≠ v2 v : nodeu1(v) nodeu (v) . . .

What does it buy us ?

• Guarantee the most-precise result w.r.t. to the abstraction– best transformer– other abstract operations

• Modular reasoning– assume-guarantee reasoning– scalability

AbstractConcrete

The assume[](a) Operation

a

= ((a))

Formulas

(a) ^

X

(a)

( (a) )^ ^

assume[](a)

X

Formulas AbstractConcrete

The abstraction operation () ^

a1a2

Assume-Guarantee Reasoning using AI

T bar();

void foo() {

T p;...

p = bar();

...

}

{prebar, postbar}

{prefoo, postfoo}

assume[prefoo];

assert[prebar];-----------assume[postbar];

assert[postfoo];

^Is (a) ?

assert[](a)assume[](a)

<⊤>

<a1>

<a2>

<a3>

<a4> ( ( (a) ⋀ ))^ ^

Formulas AbstractConcrete

Computing ()

^

ans

a1

3-Valued Logical Structures

• Relation meaning over {0, 1, ½}

• Kleene– 1: True– 0: False

– ½ : Unknown

• A join semi-lattice: 0 ⊔ 1 = ½

½

Canonical Abstraction

x

u1 u2 u3 u4

c,rxc,rxc,rxc,rx

xu1 u2

c,rx c,rx

x

∃v1,v2:nodeu1(v1) node⋀ u2(v2)⋀∀w: nodeu1(w) node⋁ u2(w)

⋀ ∀w1,w2:nodeu1(w1) node⋀ u1(w2)

⇒(w1=w2)⋀ n(w⌝ 1,w2) v:r⋀∀ x(v)⇔ v1: x(v1) n*(v1,v) ∃ ⋀v:c(v)⇔ v1:n(v,v1) n*(v1,v)⋀∀ ∃ ⋀⋀∀v1,v2:x(v1) x(v2) v1=v2⋀ ⇒

⋀ ∀v,v1,v2:n(v,v1) n(v,v2) v1=v2⋀ ⇒

FOFOTCTC

(a) ≜^

y == x->n

FormulasConcrete

⊤ ans

≜ ∀v1:y(v1) ↔∃v2: x(v2) n(v⋀ 2, v1)

Abstract

xu1 u2

y y

Abstract

xu1 uy

y

xu1 u2uy

y

x

(()^

Example - Materialization

xu1 u2

y y

xu1 u2

y y

y(u2)=0materialization

u2 uy, u2

y(uy) = 1, y(u2) =0

u2

xu1 uy

y y y

y(u2)=1

xu1 u2

yy

Is (a)

satisfiable ?

^

y == x->n

Abstract Operations

() – best abstract value that represents • What does it buy us ?• assume[](a) = ( (a) ⋀ )

– assume-guarantee reasoning – pre- and post-conditions specified by logical

formulas

• BT(t,a) = ( (extend(a)) t )⋀– best abstract transformer– parametric abstractions

• meet(a1, a2) = ( (a1) ⋀ (a2) )

^

^^^

^^

^^

SPASS Experience

• Handles arbitrary FO formulas

• Can diverge– use timeout

• Converges in our examples– Captures older shape analysis algorithms

• How to handle FOTC ?– Overapproximations lead to too many

structures

Decidable Transitive-closure Logic• Neil Immerman (UMASS), Alexander Rabinovich

(TAU)

• ∃∀(TC,f) is subset of FOTC – exist-forall form – arbitrary unary relations– single function f

• Decidable for satisfiability– NEXPTIME-complete

• Any “reasonable” extension is undecidable

• Rather limited

Simulation Technique – CAV’04• Neil Immerman (UMASS), Alexander Rabinovich

(TAU)

• Simulate realistic data structures using decidable logic over tractable structures– Singly linked list - shared/cyclic/nested– Doubly linked list– Trees

• Preserved under mutations

• Abstract interpretation, Hoare-style verification

Further Work

• Implementation• Decidable logic for shape analysis• Assume-guarantee of “real” programs

– case study: Java Collection (B. Livshits, Noam)– Estimate side-effects (A. Skidanov)– specification language– write procedure specifications

• Extend to other domains– Infinite-height

• Tune the abstraction based on specification

Summary

• A/G Approach can scale program analysis/verification

• But requires some effort– Language designers– Programmers– Abstract interpretation– Efficient runtime testing

top related