shape analysis via 3-valued logic mooly sagiv tel aviv university msagiv/toplas02.ps tvla
Post on 19-Dec-2015
219 views
TRANSCRIPT
Shape Analysisvia 3-Valued Logic
Mooly SagivTel Aviv University
http://www.cs.tau.ac.il/~msagiv/toplas02.ps
www.cs.tau.ac.il/~tvla
Topics• A new abstract domain for static
analysis• Abstract dynamically allocated memory
• TVLA: A system for generating abstract interpreters
• Applications
Motivation
• Dynamically allocated storage and pointers are essential programming tools– Object oriented– Modularity– Data structure
• But– Error prone– Inefficient
• Static analysis can be very useful here
A Pathological C Program
a = malloc(…) ;
b = a;
free (a);
c = malloc (…);
if (b == c) printf(“unexpected equality”);
Dereference of NULL pointers
typedef struct element {
int value;
struct element *next;
} Elements
bool search(int value, Elements *c) {Elements *elem;for (elem = c;
c != NULL;
elem = elem->next;)if (elem->val == value)
return TRUE;
return FALSE
Dereference of NULL pointers
typedef struct element {
int value;
struct element *next;
} Elements
bool search(int value, Elements *c) {Elements *elem;for (elem = c;
c != NULL;
elem = elem->next;)if (elem->val == value)
return TRUE;
return FALSE
potential null de-reference
Memory leakageElements* reverse(Elements *c)
{
Elements *h,*g;h = NULL;while (c!= NULL) {
g = c->next;h = c;c->next = h;c = g;
}return h;
typedef struct element {
int value;
struct element *next;
} Elements
Memory leakageElements* reverse(Elements *c)
{
Elements *h,*g;h = NULL;while (c!= NULL) {
g = c->next;h = c;c->next = h;c = g;
}return h;
leakage of address pointed-by h
typedef struct element {
int value;
struct element *next;
} Elements
Memory leakageElements* reverse(Elements *c)
{
Elements *h,*g;h = NULL;while (c!= NULL) {
g = c->next;h = c;c->next = h;c = g;
}return h;
typedef struct element {
int value;
struct element *next;
} Elements
✔ No memory leaks
Example: List Creationtypedef struct node { int val; struct node *next;} *List;
✔ No null dereferences
✔ No memory leaks
✔ Returns acyclic list
List create (…)
{
List x, t;
x = NULL;
while (…) do {
t = malloc();
t next=x;
x = t ;}
return x;
}
Example: Collecting Interpretation
x
tn n
t
x
n
x
t n
x
tn n
x
tn n
xtt
x
ntt
nt
x
t
x
t
xempty
return x
x = t
t =malloc(..);
tnext=x;
x = NULL
TF
Example: Abstract Interpretation
t
x
n
x
t n
x
tn n
xtt
x
ntt
nt
x
t
x
t
xempty
x
tn
n
x
tn
n
n
x
tn
t n
xn
x
tn
nreturn x
x = t
t =malloc(..);
tnext=x;
x = NULL
TF
Challenge 1 - Memory Allocation
• The number of allocated objects/threads is not known
• Concrete state space is infinite
• How to guarantee termination?
Challenge 2 - Destructive Updates
• The program manipulates states using destructive updates – e next = t
• Hard to define concrete interpretation
• Harder to define abstract interpretation
Challenge 2 - Destructive Update
Unsound
yp
x
yp
x n
px n
ynext = NULL
y
px n
yp
x
px n
Challenge 2 - Destructive Update
Imprecise
ynext = NULL
y
px n
y
px n
Challenge 3 – Re-establishing Data Structure Invariants
• Data-structure invariants typically only hold at the beginning and end of ADT operations
• Need to verify that data-structure invariants are re-established
Challenge 3 – Re-establishing Data Structure Invariants
rotate(List first, List last) {if ( first != NULL) {
last next = first;
first = first next;
last = last next;
last next = NULL;
}
}
lastfirst n n n
lastfirst n n n
n
lastfirst
n n n
n
lastfirst
n n n
n
lastfirst
n n
n
Plan
• Concrete interpretation
• Canonical abstraction
• Abstract interpretation using canonical abstraction
• The TVLA system
Traditional Heap Interpretation• States = Two level stores
– Env: Var Values– fields: Loc Values– Values=Loc Atoms
• Example – Env = [x 30, p 79]– next = [30 40, 40 50, 50 79, 79 90]– val = [30 1, 40 2, 50 3, 79 4, 90 5]
1 40 2 50 3 79 4 90 5 0 x
p
30 40 50 79 90
Predicate Logic• Vocabulary
– A finite set of predicate symbols Peach with a fixed arity
• Logical Structures S provide meaning for predicates – A set of individuals (nodes) U– pS: (US)k {0, 1}
• FOTC over TC, express logical structure properties
Representing Stores as Logical Structures
• Locations Individuals• Program variables Unary predicates• Fields Binary predicates• Example
– U = {u1, u2, u3, u4, u5}– x = {u1}, p = {u3}– n = {<u1, u2>, <u2, u3>, <u3, u4>, <u4, u5>}
u1 u2 u3 u4 u5xn n n n
p
Formal Semantics of First Order Formulae
• For a structure S=<US, pS>
• Formulae with LVar free variables
• Assignment z: LVarUS
S(z): {0, 1}
1S(z)=1
p (v1, v2, …, vk)S(z)=pS (z(v1), z(v2), …, z(vk))
0S(z)=0
Formal Semantics of First Order Formulae
• For a structure S=<US, pS>
• Formulae with LVar free variables
• Assignment z: LVarUS
S(z): {0, 1}
12S(z)=max (1 S(z), 2 S(z))
12S(z)=min (1 S(z), 2 S(z))
1S(z)=1- 1 S(z)
v: 1S(z)=max {1 S(z[vu]) : u US}
Formal Semantics of Transitive Closure
• For a structure S=<US, pS>
• Formulae with LVar free variables
• Assignment z: LVarUS
S(z): {0, 1}
p*(v1, v2)S(z) = max {u1, ..., uk U, Z(v1)=u1, Z(v2)=uk} min{1 i < k} pS(ui, ui+1)
Concrete Interpretation Rules
Statement Update formula
x =NULL x’(v)= 0
x= malloc() x’(v) = IsNew(v)
x=y x’(v)= y(v)
x=y next x’(v)= w: y(w) n(w, v)
x next=y n’(v, w) = (x(v) n(v, w)) (x(v) y(w))
Invariants
• No memory leaksv: {x PVar} w: x(w) n*(w, v)
• Acyclic list(x)v, w: x(v) n*(v, w) n+(w, v)
• Reverse (x)v, w, r: x(v) n*(v, w) n(w, r) n’(r, w)
Why use logical structures?
• Naturally model pointers and dynamic allocation
• No a priori bound on number of locations
• Use formulas to express semantics• Indirect store updates using quantifiers• Can model other features
– Concurrency– Abstract fields
Why use logical structures?
• Behaves well under abstraction• Enables automatic construction of
abstract interpreters from concrete interpretation rules (TVLA)
Collecting Interpretation
• The set of reachable logical structures in every program point
• Statements operate on sets of logical structures• Cannot be directly computed for programs with
unbounded store and loopsx = NULL;
while (…) do {
t = malloc();
t next=x;
x = t
}
u1x
t
empty u1x
t
u2n
u1x
t
u2 un…n n n
Plan• Concrete interpretation
• Canonical abstraction
• TVLA
Canonical Abstraction
• Convert logical structures of unbounded size into bounded size
• Guarantees that number of logical structures in every program is finite
• Every first-order formula can be conservatively interpreted
• 1: True
• 0: False
• 1/2: Unknown
• A join semi-lattice: 0 1 = 1/2
Kleene Three-Valued Logic
1/2 Information
order
Logical order
Boolean Connectives [Kleene] 0 1/2 1
0 0 0 01/2 0 1/2 1/21 0 1/2 1
0 1/2 1
0 0 1/2 11/2 1/2 1/2 11 1 1 1
3-Valued Logical Structures
• A set of individuals (nodes) U
• Predicate meaning– pS: (US)k {0, 1, 1/2}
Canonical Abstraction
• Partition the individuals into equivalence classes based on the values of their unary predicates– Every individual is mapped into its equivalence class
• Collapse predicates via
– pS (u’1, ..., u’k) = {pB (u1, ..., uk) | f(u1)=u’1, ..., f(u’k)=u’k) }
• At most 2A abstract individuals
Canonical Abstraction
x = NULL;
while (…) do {
t = malloc();
t next=x;
x = t
}
u1x
t
u2 u3
u1x
t
u2,3
n n
n
n
x
t
n nu2u1 u3
Canonical Abstraction
x = NULL;
while (…) do {
t = malloc();
t next=x;
x = t
} u1x
t
u2,3n
n
n
Canonical Abstraction and Equality
• Summary nodes may represent more than one element
• (In)equality need not be preserved under abstraction
• Explicitly record equality
• Summary nodes are nodes with eq(u, u)=1/2
Canonical Abstraction and Equality
x = NULL;
while (…) do {
t = malloc();
t next=x;
x = t
}
u1x
t
u2 u3
u1x
t
u2,3
eq
eq
eq
n n
n
n
eq eq
eq
eq
eq
eqeq
u2,3
Canonical Abstraction
x = NULL;
while (…) do {
t = malloc();
t next=x;
x = t
}
u1x
t
u2 u3n n
u1x
t
u2,3n
n
Challenges: Heap & Concurrency[Yahav POPL’01]
• Concurrency with the heap is evil…• Java threads are just heap allocated objects• Data and control are strongly related
– Thread-scheduling info may require understanding of heap structure (e.g., scheduling queue)
– Heap analysis requires information about thread scheduling
Thread t1 = new Thread();Thread t2 = new Thread();…t = t1;…t.start();
Configurations – Example
at[l_C]
rval[myLock]
held_by
at[l_1]rval[myLock]
at[l_0]at[l_0]at[l_1]
rval[myLock]
blocked
l_0: while (true) {l_1: synchronized(myLock) {l_C: // critical actionsl_2: } l_3: }
Concrete Configuration
at[l_C]
rval[myLock]
held_by
at[l_1]rval[myLock]
at[l_0]at[l_0]
at[l_1]
rval[myLock]
blocked
Abstract Configuration
at[l_C]
rval[myLock]
held_byblocked
at[l_1]rval[myLock]
at[l_0]
Examples VerifiedProgram Property
twoLock Q No interference
No memory leaks
Partial correctness
Producer/consumer No interference
No memory leaks
Apprentice
Challenge
Counter increasing
Dining philosophers with resource ordering
Absence of deadlock
Mutex Mutual exclusion
Web Server No interference
Summary
• Canonical abstraction guarantees finite number of structures
• The concrete location of an object plays no significance
• But what is the significance of 3-valued logic?
Topics• Embedding • Instrumentation
• Abstract Interpretation
• [Extensions]
Embedding
u1 u2 u3 u4
xu5 u6
u12 u34 u56
x
u123 u456
x
Embedding
• B f S• onto function f
• pB(u1, .., uk) pS (f(u1), ..., f(uk))
• S is a tight embedding of B with respect to f if:• pS(u#
1, .., u#k) = {pB (u1 ..., uk) | f(u1)=u#
1, ..., f(uk)=u#k}
• Canonical Abstraction is a tight embedding
Embedding (cont)
• S1 f S2 every concrete state represented by S1 is also represented by S2
• The set of nodes in S1 and S2 may be different
– No meaning for node names (abstract locations)
(S#)= {S : 2-valued structure S, S f S#}
Embedding Theorem
• Assume B f S, pB(u1, .., uk) pS (f(u1), ..., f(uk))
• Then every formula is preserved:
– If = 1 in S, then = 1 in B
– If = 0 in S, then = 0 in B
– If = 1/2 in S, then could be 0 or 1 in B
Embedding Theorem
• For every formula is preserved:
– If = 1 in S, then = 1 for all B(S)
– If = 0 in S, then = 0 for all B(S)
– If = 1/2 in S, then could be 0 or 1 in (S)
Challenge 2 - Destructive Update
Sound
ynext = NULL
y
px n
yp
x
n’(v, w) = y(v) n(v, w)
Challenge 2 - Destructive Update
Sound
ynext = NULL
y
px n
yp
x
n’(v, w) = y(v) n(v, w)
Embedding Theorem
u1x
t
u2,3n
n
v: x(v) 1=Yes
v: x(v)t(v) 1=Yes
v: x(v)y(v) 0=No
v,w: x(v)n(v, w) ½=Maybe
v, w: x(v)n(v, w) n(v, w) 0=No
v,w: x(v) n*(v,w) n+(w, w) 1/2=Maybe
Summary
• The embedding theorem eliminates the need for proving near commutavity
• Guarantees soundness
• Applied to arbitrary logics
• But can be imprecise
Limitations
• Information on summary nodes is lost
• Leads to useless verification
Increasing Precision
• User (Programming Language) supplied global invariants– Naturally expressed in FOTC
• Record extra information in the concrete interpretation– Tune the abstraction– Refine concretization
Cyclicity predicatec[x]() = v1,v2: x(v1) n*(v1,v2) n+(v2, v2)
c[x]()=0
c[x]()=0
u1x
t
u2 un…
u1x
t
u2..n
n
n
nn n
Cyclicity predicatec[x]() = v1,v2: x(v1) n*(v1,v2) n+(v2, v2)
c[x]()=1
c[x]()=1
u1x
t
u2 un…
u1x
t
u2..n
n
n
nn n
n
Heap Sharing predicate
is(v)=0
u1x
t
u2 un…
u1x
t
u2..n
n
n
is(v) = v1,v2: n(v1,v) n(v2,v) v1 v2
is(v)=0 is(v)=0
is(v)=0 is(v)=0
n n n
Heap Sharing predicate
is(v)=0
u1x
t
u2 un…
is(v) = v1,v2: n(v1,v) n(v2,v) v1 v2
is(v)=1 is(v)=0
n n
n
n
u1x
t
u2n
is(v)=0 is(v)=1 is(v)=0
n
u3..n
n
n
Concrete Interpretation RulesStatement Update formula
x =NULL x’(v)= 0
x= malloc() x’(v) = IsNew(v)
x=y x’(v)= y(v)
x=y next x’(v)= w: y(w) n(w, v)
x next=NULL n’(v, w) = x(v) n(v, w)
is’(v) = is(v) v1, v2: n(v1, v) n(v2, v) x(v1) x(v2) eq(v1, v2)
Reachability predicatet[n](v1, v2) = n*(v1,v2)
u1x
t
u2 unn n n
t[n] t[n] t[n]
t[n]
t[n]
t[n]
u1x
t
u2..n
n
n
t[n]
t[n]
t[n]
• reachable-from-variable-x(v)
• cfb(v) = v1: f(v, v1) b(v1, v)
• tree(v)
• dag(v)
• inOrder(v) = v1: n(v, v1) dle(v,v1)
• Weakest Precondition [Ramalingam PLDI 02]
Additional Instrumentation predicates
Instrumentation (Summary)• Refines the abstraction
• Adds global invariants
• But requires update-formulas (generated automatically in TVLA2
is(v) = v1,v2: n(v1,v) n(v2,v) v1 v2
is(v) v1,v2: n(v1,v) n(v2,v) v1 v2
(S#)={S : S , S f S#}
Plan• Embedding Theorem
• Instrumentation
• Abstract interpretation using canonical abstraction
• TVLA
Best Conservative Interpretation (CC79)
Abstraction
ConcretizationConcrete Representati
on
Collecting Interpretation
stc
ConcreteRepresentati
on
AbstractRepresentati
on
Abstract Representati
on
Abstract Interpretation
st#
Best Transformer (x = x n)
yx
yx
...Evaluateupdateformulas
y
x
y
x
...
inverse embedding
y
x
y
xcanoniccanonic abstraction
xy
yx
yx ...
Evaluateupdateformulas
y
x
y
x
...
inverse embedding
y
x
y
xcanoniccanonic abstraction
xy
“Focus”- Based Transformer (x = x n)
“Focus”-Based Transformer (x = x n)
y
x
y
x
EvaluateupdateFormulas (Kleene)
y
x
y
xcanonic
yx
yx
Focus(x n)
“Partial ”
xy
Semantic Reduction• Improve the precision by recovering
properties of the program semantics
• A Galois connection (L1, , , L2)
• An operation op:L2L2 is a semantic reduction lL2 op(l)l (op(l)) = (l)
• Can be applied before and after basic operations
l
L1
L2 op
Three Valued Logic Analysis (TVLA)T. Lev-Ami & R. Manevich
• Input (FOTC)
– Concrete interpretation rules
– Definition of instrumentation predicates
– Definition of safety properties
– First Order Transition System (TVP)
• Output– Warnings (text)
– The 3-valued structure at every node (invariants)
Null Dereferences
Demo
typedef struct element
{
int value;
struct element n;
} Element
bool search( int value, Element x)
{
Element c = x
while ( x != NULL ){
if (c val == value)
return TRUE;
c = c n;
}
return FALSE; }40
TVLA inputs
TVP - Three Valued Program– Predicate declaration– Action definitions SOS– Control flow graph
• TVS - Three Valued Structure
Program independent
Demo
Challenge 1
• Write a C procedure on which TVLA reports false null dereference
Proving Correctness of Sorting Implementations (Lev-Ami, Reps, S,
Wilhelm ISSTA 2000)• Partial correctness
– The elements are sorted– The list is a permutation of the original list
• Termination– At every loop iterations the set of elements
reachable from the head is decreased
Example: InsertSort
Run Demo
List InsertSort(List x) { List r, pr, rn, l, pl; r = x; pr = NULL; while (r != NULL) { l = x; rn = r n; pl = NULL; while (l != r) { if (l data > r data) { pr n = rn; r n = l; if (pl = = NULL) x = r; else pl n = r; r = pr; break; } pl = l; l = l n; } pr = r; r = rn; } return x; }
typedef struct list_cell { int data; struct list_cell *n;} *List;
pred.tvp
actions.tvp
Example: InsertSort
Run Demo
List InsertSort(List x) { if (x == NULL) return NULL pr = x; r = x->n; while (r != NULL) {
pl = x; rn = r->n; l = x->n; while (l != r) {
pr->n = rn ; r->n = l;
pl->n = r; r = pr; break; }
pl = l; l = l->n;
} pr = r; r = rn;
}
typedef struct list_cell { int data; struct list_cell *n;} *List;
14
Example: Reverse
Run Demo
typedef struct list_cell { int data; struct list_cell *n;} *List;
List reverse (List x) { List y, t; y = NULL; while (x != NULL) { t = y; y = x; x = x next; y next = t; } return y;}
Challenge
• Write a sorting C procedure on which TVLA fails to prove sortedness or permutation
Example: Mark and Sweepvoid Sweep() { unexplored = Universe collected = while (unexplored ) { x = SelectAndRemove(unexplored) if (x marked) collected = collected {x} } assert(collected = = Universe – Reachset(root) )}
void Mark(Node root) { if (root != NULL) { pending = pending = pending {root} marked = while (pending ) { x = SelectAndRemove(pending) marked = marked {x} t = x left if (t NULL) if (t marked) pending = pending {t} t = x right if (t NULL) if (t marked) pending = pending {t} } } assert(marked = = Reachset(root))}
Run Demo
pred.tvp
Challenge 2
• Use TVLA to show termination of markAndSweep
Lightweight Specification"correct usage" rules a client must follow
"call open() before read()"
Certificationdoes the client program satisfy the lightweight specification?
Verification of Safety Properties(PLDI’02, 04)
Componenta library with cleanly encapsulated state
Clienta program that uses
the library
The Canvas Project (with IBM Watson)(Component Annotation, Verification and Stuff)
Prototype Implementation
• Applied to several example programs– Up to 5000 lines of Java
• Used to verify– Absence of concurrent modification
exception – JDBC API conformance– IOStreams API conformance
Scaling
• Staged analysis• Controlled complexity
– More coarse abstractions [Manevich SAS’04]• Handle libraries
– Use procedure specifications[Yorsh, TACAS’04]
– Decision procedures for linked data structures[Immerman, CAV’04, Lev-Ami, CADE’05]
• Handling procedures– Compute procedure summaries [Jeannet, SAS’04]– Local heaps [Rinetzky, POPL’05]
y
t
g
x
x
Local heaps [Rinetzky, POPL’05]
x
y
t
g
call p(x);
x
Why is Heap Analysis Difficult?• Destructive updating through pointers
– pnext = q– Produces complicated aliasing relationships– Track aliasing on 3-valued structures
• Dynamic storage allocation– No bound on the size of run-time data structures– Canonical abstraction finite-sized 3-valued structures
• Data-structure invariants typically only hold at the beginning and end of operations– Need to verify that data-structure invariants are re-
established– Query the 3-valued structures that arise at the exit
Summary
• Canonical abstraction is powerful– Intuitive– Adapts to the property of interest
• Used to verify interesting program properties– Very few false alarms
• But scaling is an issue
Summary
• Effective Abstract Interpretation– Always terminates– Precise enough– But still expensive
• Can model– Heap– Unbounded arrays– Concurrency
• More instrumentation can mean more efficient• But canonic abstraction is limited
– Correlation between list lengths– Arithmetic– Partial heaps
Summary
• The embedding theorem eliminates the need for proving near commutavity
• Guarantees soundness
• Applied to arbitrary logics
• But can be imprecise