assume/guarantee reasoning using abstract interpretation
DESCRIPTION
Assume/Guarantee Reasoning using Abstract Interpretation. Nurit Dor Tom Reps Greta Yorsh Mooly Sagiv. Limitations of Whole Program Analysis. Complexity of Chaotic Iterations Not all the source code is available Large libraries Software components No interaction with the client - PowerPoint PPT PresentationTRANSCRIPT
Assume/Guarantee Reasoningusing Abstract Interpretation
Nurit DorTom Reps
Greta YorshMooly Sagiv
Limitations of Whole Program Analysis
• Complexity of Chaotic Iterations
• Not all the source code is available– Large libraries– Software components
• No interaction with the client– Program design
A Motivating ExampleList rev(List x) {
if (x ==null) return null ;
return append(rev(xnext), x);
}
List append(List x, List y) {
List e;
if (x == null) return y;
e = malloc(…);
edata = xdata;
enext = append(xnext, y);
}
List rev(List x)
requires acyclic(x)
ensures $$=reverse(x)
List append(List x, List y)
requires acyclic(x) acyclic(y)
ensures $$= x || y
Con
tractC
ontract
Can also used for runtime testing
Challenges in A/G Reasoning
• Specifying procedure contracts
• Performing abstract interpretation using contracts
Specifying Contracts
• Executable specifications– assert– Can use loops– Expressive– Natural– But what about side-effects
• Declarative specifications– Types– First order logic– Z
• Hybrid– Larch– Java Modeling Language
Procedure Contracts and Modularity
• The postcondition does not reveal the whole story
void foo(List x, List z) {
List y, t ;
y = rev(x);
t = rev(z);
}
List rev(List x)
requires acyclic(x)
ensures $$=reverse(x)
List foo(List x)
requires acyclic(x) acyclic(y)
ensures true
Procedure Contracts and Modularity
• Specify parts of the state which may be modified
• But difficult to define potential side-effects• Can use abstract interpretation
void foo(List x, List z) {
List y, t ;
y = rev(x);
t = rev(z)
}
List rev(List x)
requires acyclic(x)
ensures $$=reverse(x)
List foo(List x)
requires acyclic(x) acyclic(y)
ensures true
Issues in Specifying Contracts
• Expressible
• Conciseness
• Natural
• Reuse
• Cost of dynamic check (model checking)
• Decidability
• Cost of abstract interpretation
Plan
• CSSV: A tool for verifying absence of buffer overruns (N. Dor)
• An algorithm for performing abstract interpretation in the most precise way using specification
CSSV: Towards a Realistic Tool for Statically Detecting All
Buffer Overflows in CNurit Dor, Michael Rodeh, Mooly Sagiv
DAEDALUS project
/* from web2c [strpascal.c] */
void foo(char *s)
{
while ( *s != ‘ ‘ )
s++;
*s = 0;
}
Vulnerabilities of C programs
Null dereferenceDereference to unallocated storage
Out of bound pointer arithmetic
Out of bound update
Is it common?
• General belief – yes!• FUZZ study
– Test reliability by random input
– Tens of applications on 9 different UNIX systems
– 18% – 23% hang or crash
• CERT advisory– Up to 50% of attacks are due to buffer overflow
COMMON AND DANGEROUS
CSSV’s Goals
• Efficient conservative static checking algorithm– Verify the absence of buffer overflow
• not just finding bugs
– All C constructs• Pointer arithmetic, casting, dynamic memory, …
– Real programs– Minimum false alarms
Verifying Absence of Buffer Overflow is non-trivial
void safe_cat(char *dst, int size, char *src )
{ if ( size > strlen(src) + strlen(dst) ) {
dst = dst + strlen(dst);
strcpy(dst, src); }}
{string(src) alloc(dst) > len(src)}
{string(src) string(dst) alloc(dst+len(dst)) > len(src)}
string(src) string(dst) (size > len(src)+len(dst)) alloc(dst+len(dst)) > len(src))
Can this be done for real programs?
• Complex linear relationships• Pointer arithmetic• Loops• Procedures
• Use Polyhedra[CH78]• Pointer analysis• Widening• Procedure contracts
Very few false alarms!
Linear Relation Analysis
Cousot and Halbwachs, 78 Statically analyze program variable relations:
a1* var1 + a2* var2 + … + an* varn b Polyhedron
y 1 x + y 3-x + y 1
0 1 2 3 x
0
1
2
3
y
V = { (1,2) (2,1) }R = { (1,0) (1,1) }
C String Static Verifier
• Detects string violations– Buffer overflow (update beyond bounds)– Unsafe pointer arithmetic– References beyond null termination– Unsafe library calls
• Handles full C– Multi-level pointers, pointer arithmetic, structures, casting, …
• Applied to real programs– Public domain software– C code from Airbus
Plan
• Semantics for C program
• Contract language
• Static analysis algorithm
• Implementation
Standard C Semantics
void safe_cat( char *dst, int size, char *src )
{ if ( size > strlen(src) + strlen(dst) ) {
dst = dst + strlen(dst);strcpy(dst, src);
}}
src 0x480588
dst 0x480580
size 0x480584
0x5058510
125
‘x’0x5050510
0x5050518 0
‘y’0x6000009
0x6000A00 0
0x6000009
Instrumented C Semantics
src 0x480588
dst 0x480580
size 0x480584
0x5058510
125
‘x’0x5050510
0x5050518 0
‘y’0x6000009
0x6000A00 0
4
130
base asize
4
4
245
0x6000009
Instrumented C Semantics
src 0x480588
dst 0x480580
size 0x480584
0x5058510
125
‘x’0x5050510
0x5050518 0
‘y’0x6000009
0x6000A00 0
4
130
base asize
4
4
245
0x6000009
0
offset
9
0x6000000
The instrumented semantics checks validity of C expressions ANSI C Cleanness
dst = dst + i
Safety
offset(dst) + i asize(base(dst))
dst
offset(dst)
base(dst)
asize(base(dst))i
Contracts
• Defined in the instrumented semantics• Specify string behavior of procedures (C
expressions)– Precondition
– Postcondition• Use of values at procedure entry
– Side-effects• Can be approximated from pointed information
• No need to specify pointer information– Not aiming for modular pointer analysis
Contracts’ Advantages
• Modular analysis – Use contracts on call statements– Not all the code is available– Enable more expensive analyses
• User control of the verification– Detect errors at point of logical error– Improve the precision of the analysis
• Check additional properties– Beyond ANSI-C
Example
char* strcpy(char* dst, char* src)
requires
mod
ensures
( string(src) alloc(dst) > len(src))
( len(dst) = = [len(src)]pre
return = = [dst]pre
)
dst
safe_cat’s contract
void safe_cat(char* dst, int size, char* src)
requires
mod
ensures
( string(src) string(dst) alloc(dst) == size)
( len(dst) <= [len(src)]pre +
[len(dst)]pre len(dst) >= [len(dst)]pre
)
dst
Contracts and Soundness
• All errors are detected– Violation of statement’s precondition
• …a[i]…
– Violation of procedure’s precondition• Call
– Violation of procedure's postcondition• Return
• Violation messages depend on the contracts• But may lead to more false alarms (e.g., trivial
contracts)
CSSV Static Analysis
1. Inline contracts• Expose behavior of called procedures
2. Pointer analysis (global)• Find relationship between base addresses
3. Integer analysis• Compute offset information
Step 1: Inliner
void safe_cat( char *dst, int size, char *src )
{ …
strcpy(dst, src); …}
void safe_cat( char *dst, int size, char *src )
requires ( string(src) string(dst) alloc(dst) == size)mod dstensures ( len(dst) = =
[pre@len(src)]pre + [len(dst)]pre )
char* strcpy( char *dst, char *src )requires ( string(src) alloc(dst) > len(src))mod dst
ensures ( len(dst) = = [len(src)]pre return = = [dst]pre
)
Step 1: Inliner
void safe_cat( char *dst, int size, char *src )
{ …
strcpy(dst, src); …}
void safe_cat( char *dst, int size, char *src )
requires ( string(src) string(dst) alloc(dst) == size)mod dstensures ( len(dst) = =
[pre@len(src)]pre + [len(dst)]pre )
char* strcpy( char *dst, char *src )requires ( string(src) alloc(dst) > len(src))mod dst
ensures ( len(dst) = = [len(src)]pre return = = [dst]pre
)
assume
assert
Step 1: Inliner
void safe_cat( char *dst, int size, char *src )
{ …
strcpy(dst, src); …}
void safe_cat( char *dst, int size, char *src )
requires ( string(src) string(dst) alloc(dst) == size)mod dstensures ( len(dst) = =
[pre@len(src)]pre + [len(dst)]pre )
char* strcpy( char *dst, char *src )requires ( string(src) alloc(dst) > len(src))mod dst
ensures ( len(dst) = = [len(src)]pre return = = [dst]pre
)
assume
assert
Step 2: Compute Pointer Information
• Required for reasoning about pointers• Every base address is abstracted by an abstract
location• Relationships between base addresses is computed
(points-to)• Global analysis
– Scalable– Imprecise
• Flow insensitive• (Almost) Context insensitive
Global Points-To
main() {char s[10], t[20],r;char *p1, *p2; …p1= r + i;safe_cat(s,10,p1);p2 = r + j;safe_cat(t,10,p2);…
}
s t r
p2
dst src
safe_cat( char *dst, int size, char *src )
{ … strcpy(dst, src); …}
p1
Procedural Points-to (PPT)
• “Project” pointer information on visible variables of the procedure
• Introduce abstract locations for formal parameters• Allow destructive updates through formal
parameters (well behaved programs)• Can decrease precision in some procedures
PPT
Param #1
Param # 2
dst src
safe_cat( char *dst, int size, char *src )
{ … strcpy(dst, src); …}
Step 3: Static Analysis
• Prove linear inequalities on string indices • Abstract string properties using constraint
variables• Use abstract interpretation to conservatively
interpret program statements• Verify safety preconditions
Back to Semantics
src 0x480588
dst 0x480580
size 0x480584
0x5058510
125
‘x’0x5050510
0x5050518 0
‘y’0x6000009
0x6000A00 0
4
130
base asize
4
4
245
0x6000009
0
offset
9
0x6000000
Abstract Representation
src
dst
size
n1
n2
Base address relationship
src 0x480588
dst 0x480580
size 0x480584
0x5058510
125
‘x’0x5050510
0x5050518 0
‘y’0x6000009
0x6000A00 0
0x6000009
0x6000000
Constraint Variables
• For every abstract location
a.offset
src.offset = 9
src
Constraint Variables
• For every integer abstract location
a.val
size.val = 125
size
Constraint Variables
• For every abstract location
a.is_nullt
a.len
a.asize
n1
n1.lenn1.asize
0
Abstract Representation
src
dst
size
n1
n2
dst.offset < n1.len
size.val+ dst.offset = n1.asize
n1.is_nullt = true
n2.is_nullt = true
What does it represent?
dstsize
?
?
n1.is_nullt = true
0
?dst.offset < n1.len
n 1.len
dst.o
ffse
t
size.val + dst.offset = n1.asize
size
.val
n 1.asi
ze
Abstract Interpretation
dst.offset < n1.len
size.val = n1.asize - dst.offset
dst = dst + strlen(dst);
dst.offset = n1.len
size.val = n1.asize - dst.offset + n1.len
Verify Safety Condition
dst = dst + i
dst
offset(dst)
base(dst)
asize(base(dst))i
offset(dst) + i asize(base(dst))
concrete semantics abstract semantics
dst.offset + i.val n1.asize
n1
dst.offsetn1.asize
dst
i
The Assume-Operation
• Use two copies of constraint variables
• Set modified values to ⊤• Meet the post
CSSV Implementation
Cfiles
PreModPost
Cfiles
cont
ract
s
Procedure name
Pointer Analysis
Procedure’sPointer infoInliner
Cfiles
C’files
C2IP
Integer Procedure
Potential Error Messages
Integer Analysis
Used Software
• ASToolKit [Microsoft]
• Core C [TAU - Greta Yorsh]
• GOLF [Microsoft - Manuvir Das]
• New Polka [Inria - Bertrand Jeannet]
Applications
• Verified string library from Airbus with 6 false alarms– Could be avoided by analyzing correlated conditions
• Found 8 real errors in another string intensive application with 2 false alarms– In one case safety depends on correctness– Could be avoided by defensive programming
• 1 - 206 CPU seconds per procedure– No optimizations
• Very few false alarms
Related Work
Non-Conservative
• Wagner et. al. [NDSS’00]
• LCLint’s extension [USENIX’01]
• Eau Claire [IEEE Oakland 02]
Conservative
• Polyspace verifier
• Dor, Rodeh and Sagiv [SAS’01]
Further work
• Derive contracts
• Improve efficiency
• Interprocedural
CSSV: Summary
• Semantics– Safety checking
– Full C
– Enables abstractions
• Contract language– String behavior
– Omit pointer aliasing
• Procedural points-to – Scalable
– Improve precision
• Static analysis – Tracks important string
properties
– Utilizes integer analysis
Foundation of A/G abstract interpretation
Greta Yorsh
www.cs.tau.ac.il/~gretay
Assume-Guarantee Reasoning using AI
T bar();
void foo() {
T p;...
p = bar();
...
}
{prebar, postbar}
{prefoo, postfoo}
assume[prefoo];
assert[prebar];-----------assume[postbar];
assert[postfoo];
Is (a) ?
assert[](a)assume[](a)
<⊤>
<a1>
<a2>
<a3>
<a4>( (a) ⋂ ) a ⋂ ( )
Goals
• Generic algorithms for assert & assume
• Effective
• Efficient
• Allow natural specifications
• Rather precise verification
Motivation
• New approach to using symbolic techniques in abstract interpretation – for shape analysis– for other analyses
• What does it mean to harness a decision procedure for use in static analysis?– what are the requirements ?– what does it buy us ?
What are the requirements ?
Formulas
S ∈ (a) ⇔ S (a) ^
AbstractConcrete
a
Is (a) empty? Is (a) satisfiable?^⇔
(a)
[x0, y0, z0]
[x0, y1, z0]
[x0, y2, z0]
[x0, y, z0]
AbstractConcrete Formulas
(x=0)(z=0)
S ⊧ (a) ⇔ S ∈(a)^
FormulasConcreteValues
AbstractValues
u1
xu
x
...x
v1,v2 : nodeu1(v1) nodeu (v2) v1 ≠ v2 v : nodeu1(v) nodeu (v) . . .
What does it buy us ?
• Guarantee the most-precise result w.r.t. to the abstraction– best transformer– other abstract operations
• Modular reasoning– assume-guarantee reasoning– scalability
AbstractConcrete
The assume[](a) Operation
a
= ((a))
Formulas
(a) ^
X
(a)
( (a) )^ ^
assume[](a)
X
Formulas AbstractConcrete
The abstraction operation () ^
a1a2
Assume-Guarantee Reasoning using AI
T bar();
void foo() {
T p;...
p = bar();
...
}
{prebar, postbar}
{prefoo, postfoo}
assume[prefoo];
assert[prebar];-----------assume[postbar];
assert[postfoo];
^Is (a) ?
assert[](a)assume[](a)
<⊤>
<a1>
<a2>
<a3>
<a4> ( ( (a) ⋀ ))^ ^
Formulas AbstractConcrete
Computing ()
^
ans
⊤
a1
3-Valued Logical Structures
• Relation meaning over {0, 1, ½}
• Kleene– 1: True– 0: False
– ½ : Unknown
• A join semi-lattice: 0 ⊔ 1 = ½
½
Canonical Abstraction
x
u1 u2 u3 u4
c,rxc,rxc,rxc,rx
xu1 u2
c,rx c,rx
x
∃v1,v2:nodeu1(v1) node⋀ u2(v2)⋀∀w: nodeu1(w) node⋁ u2(w)
⋀ ∀w1,w2:nodeu1(w1) node⋀ u1(w2)
⇒(w1=w2)⋀ n(w⌝ 1,w2) v:r⋀∀ x(v)⇔ v1: x(v1) n*(v1,v) ∃ ⋀v:c(v)⇔ v1:n(v,v1) n*(v1,v)⋀∀ ∃ ⋀⋀∀v1,v2:x(v1) x(v2) v1=v2⋀ ⇒
⋀ ∀v,v1,v2:n(v,v1) n(v,v2) v1=v2⋀ ⇒
FOFOTCTC
(a) ≜^
y == x->n
FormulasConcrete
⊤ ans
≜ ∀v1:y(v1) ↔∃v2: x(v2) n(v⋀ 2, v1)
Abstract
xu1 u2
y y
Abstract
xu1 uy
y
xu1 u2uy
y
x
(()^
Example - Materialization
xu1 u2
y y
xu1 u2
y y
y(u2)=0materialization
u2 uy, u2
y(uy) = 1, y(u2) =0
u2
xu1 uy
y y y
y(u2)=1
xu1 u2
yy
Is (a)
satisfiable ?
^
y == x->n
Abstract Operations
() – best abstract value that represents • What does it buy us ?• assume[](a) = ( (a) ⋀ )
– assume-guarantee reasoning – pre- and post-conditions specified by logical
formulas
• BT(t,a) = ( (extend(a)) t )⋀– best abstract transformer– parametric abstractions
• meet(a1, a2) = ( (a1) ⋀ (a2) )
^
^^^
^^
^^
SPASS Experience
• Handles arbitrary FO formulas
• Can diverge– use timeout
• Converges in our examples– Captures older shape analysis algorithms
• How to handle FOTC ?– Overapproximations lead to too many
structures
Decidable Transitive-closure Logic• Neil Immerman (UMASS), Alexander Rabinovich
(TAU)
• ∃∀(TC,f) is subset of FOTC – exist-forall form – arbitrary unary relations– single function f
• Decidable for satisfiability– NEXPTIME-complete
• Any “reasonable” extension is undecidable
• Rather limited
Simulation Technique – CAV’04• Neil Immerman (UMASS), Alexander Rabinovich
(TAU)
• Simulate realistic data structures using decidable logic over tractable structures– Singly linked list - shared/cyclic/nested– Doubly linked list– Trees
• Preserved under mutations
• Abstract interpretation, Hoare-style verification
Further Work
• Implementation• Decidable logic for shape analysis• Assume-guarantee of “real” programs
– case study: Java Collection (B. Livshits, Noam)– Estimate side-effects (A. Skidanov)– specification language– write procedure specifications
• Extend to other domains– Infinite-height
• Tune the abstraction based on specification
Summary
• A/G Approach can scale program analysis/verification
• But requires some effort– Language designers– Programmers– Abstract interpretation– Efficient runtime testing