modular static analysis with sets and relations for verifying data structure consistency

Modular Static Analysiswith Sets and Relations

for Verifying Data Structure Consistency

Viktor Kuncak

Computer Science and Artificial Intelligence Lab

MIT

Martin RinardAndreas PodelskiDaniel Jackson

Patrick LamThomas WiesKaren Zee

Huu Hai NguyenPeter SchmittSuhabe Bugrara

Joint work with:

Program analysis and verification

Discover/verify properties of software systems

Practical relevance: programmer productivity– performance: compiler optimizations– reliability: discovering and preventing errors– maintainability: understanding code

Broader implications– automated analysis of formal artifacts

(implications for XML documents, formal proofs)

Spectrum of analysis techniques

Broad research area, many dimensions– bug finding versus bug prevention– control-intensive versus data-intensive systems– generic versus application-specific properties

Original ideal: full program verification

Reality: verify partial correctness properties– success story: type systems– active area: temporal properties (typestate)

trend: towards complex properties

Data structure consistency properties

next

prev

next next

prev prev

root

acyclicity of next

x.next.prev = x

next nextfirst

3

size

rightleft

size field is consistent withthe number of stored objects

graph is a treeshape not given by types,

but by structural properties;

may change over time

unbounded number of objects, dynamically allocated

rightleft

class Node { Node f1, f2;}

Inconsistent data structures

Can cause program crashes

Looping

nextnext next

prev prev

next next next

Unexpected outcome of operations– removing two instead of one element

internal consistency

External data structure consistency

If a person has borrowed a book, then – person is registered with library, and– book is in the catalog

Book

Person

borrows

[0..4] A person can borrow at most 4 books at a time

Two persons cannot borrow the same book

[0..1]

- correlate different data structures - global- meaningful to users of the system- capture design constraints (object models)- inconsistency can lead to policy violations

relies on internal consistency to be even meaningful

Goal

Prove data structure consistency– for all program executions (sound)– with high level of automation– both internal and external consistency– both implementation and use of data structures

Using static analysis to enforce data structure consistency

source code of a program

static analyzer

data structures are consistent

error in program !x.next.prev = x

BAr

consistency properties

. . . proc remove(x : Node) { Node p=x.prev; n=x.next; if (p!=null) p.next = n; else root = n; if (n!=null) n.prev = p; } . . .

Challenges in verifying consistency

complexheterogenous data structures,in the context of application;developer-defined properties

precision

no single approachwill work

communicationwith developers

scalability

Outline

Goal: verify data structure consistency

Our approach through an example

Bohne: one of the analyses in our system

Current status and ongoing work

Future work

Example: Minesweeper Game

Analyzed using our system (based on Java version)

(actual screenshot)

Minesweeper game data structures

Cell object

init

true

isExposed

false

next

prev

next

prev

Minesweeper consistency properties

init

true

isExposed

false

next

prev

next

prev

next is acyclic

prev is inverse of next

object is in hidden cells list iff initialized and isExposed is false

1

isExposed

true

object is in hidden cells list iff its init flag is true and its isExposed flag is false

Formalization as an invariant

Difficulties– need to track exact reachability properties– correlate linking information with stored data

Need a way to deal with complexity

Complex consistency properties

{x | next*(HiddenListRoot,x) }{x | x.init & ! x.isExposed }

=

expression that is true whenever program

reaches certain points

{x | next*(HiddenListRoot,x) }{x | next*(HiddenListRoot,x) }

object is in hidden cells list iff its init flag is true and its isExposed flag is false

Formalization as an invariant

Towards factoring out complexity

{x | x.init & ! x.isExposed }

ListContent =

=

UnexposedCells =

ListContentUnexposedCells{x | x.init & ! x.isExposed }

How to enable such reasoning in our program?

abstract reasoning in terms of sets

Board module

List module

Minesweeper source code

proc remove(c : Cell) { Cell p=x.prev; n=x.next; if (p!=null) p.next = n; else root = n; if (n!=null) n.prev = p;}

init : bool;

proc expose(c:Cell) { remove(c); setFlag(c);}

proc setFlag(c : Cell) { c.isExposed = true;}

isExposed : bool; next, prev : Cell;

record Cell {

}

init

isExposed

next

prev

next

prev

partial record Cell {

}

encapsulate state

encapsulate operations

partial record Cell {

}

replace implementations(in the analysis only)

with set specifications

partial record Cell { } List.content =Board.UnexpCells

Encapsulating complexity in modules

No need to reason about data structure details! can use more scalable analysescontent UnexpCells

Reasoning in terms of setsMinesweeper source code

Board module

List.content =Board.UnexpCells

content ’ = content - c UnexpCells’= UnexpCells - c

List module

partial record Cell { }


equality is preserved:

proc remove(c : Cell) proc setFlag(c : Cell)

Justifying reasoning in terms of setsMinesweeper source code

List.content =Board.UnexpCells

content ’ = content - c

List module

partial record Cell { }


proc remove(c : Cell)

proc remove(c : Cell) { ... if (p!=null) p.next = n; ...}

UnexpCells’= UnexpCells - c

proc setFlag(c : Cell)

UnexpCells={x|x.init&!x.isExposed}

proc setFlag(c : Cell) { c.isExposed = true;}

specification section

abstraction section

implementation section

Three sections of a module

content = {x | next*(root, x) }

modularizedthe invariant!

abst module List { content = { x : Cell | next* root x} ; invariant tree [next]; invariant ALL x y. prev x = y ! (x null Æ y null ! next y = x);}

spec module List {

specvar content : Cell set;

proc remove(c : Cell) requires c in content & c != null modifies content ensures content ’ = content - c;

}

impl module List {partial record Cell { next, prev : Cell; }var root : Cell;proc remove(c : Cell) { Cell p=c.prev; n=c.next; if (p!=null) p.next = n; else root = n; if (n!=null) n.prev = p; }}

List module

showing conformance:use precise analyses but

only inside the List module

reasoning about List invariants is

confined to List module

Verification of List has dual benefits:

• justify analysis of clients

• prove partial correctness of List operations

Reasoning about program in terms of simpler interfaces - uses of interfaces - global consistency

scalable analyses

Summary of our approach: two steps

A implementation

A interface

B implementation

B interface

Checking that interfaces reflect implementationsand internal consistency is preserved - precise analyses

Application(Data Structure Client)

This approach addresses challenges

A interface B interface

Reasoning about program in terms of simpler structures

Checking that abstract structures reflect implementations

scalability

precision: within data structures

heterogeneity: multiple analyses

analysis1 analysis2

developers communicatewith system via interfaces

A implementation B implementation

Used in manual verification, VDM, ESC/Java as data abstraction

Application(Data Structure Client) analysis3

Key question in automating approach(while keeping it useful)

Application(Data Structure Client)

A interface B interface

analysis1 analysis2

analysis3

A implementation B implementation

How to chooseinterface language?

Our solution: set algebra

Set algebra as interface language

Useful: express key data structure properties– disjointness (A Å B = ;), inclusion (A µ B)– insertion (S’ = S [ x), removal (S’ = S \ x)– conceptual object state

• initialization, sequencing of API operations• symbolic notations for hierarchical state charts

Verifiable: on both sides of set abstraction– typestate techniques for interface uses– shape analyses for interface implementations

Two systems based on this insight

MONA CVC Litefield

constraintanalysis

Flag analysis for high-level

properties

Bohneinvariant inference

Jahobdata structure

analysis system

IsabelleOmega solver

for linear arithmetic

verificationconditiongenerator

BAPAdecision proceduredispatcher

SVV'05VMCAI'06

CADE'05, JAR

POPL’02SAS’03VMCAI’04

VMCAI'05

CC'05AOSD'05VSTTE’05

decision procedures and theorem provers

annotation inference algorithms

VMCAI’06

Hob data structure

analysis system

Outline




Current status and ongoing work

Future work

Bohne analysis properties

Analyzes linked data structures

Precisely handles reachability propertiescan define set of elements reachable from root:

content = { x | next*(root,x) }

Predictable: based on decision procedures

next

prev

next next

prev prev

rootrightleft

Starting point

MONA CVC Litefield

constraintanalysis


properties






SVV'05VMCAI'06

CADE'05, JAR


VMCAI'05


invariant inference algorithms

VMCAI’06

Jahobdata structure

analysis system


Hob data structure

analysis system

Verification condition (VC) – a logical formula saying: “If precondition holds at entry, then postcondition holds in the final state, invariants are preserved, and there are no run-time errors”

verification condition generator

basic verifier = vcgen + decision procedure

data structures are consistent

error in program !

Bohne analysis

decision procedure

valid

invalid

implementation

specification

abstraction

syntactic translation(as in symbolic execution)

VC: pre wlpbody(post)pre, body , post

Decision procedure

Goal: precise reasoning about reachability

Reachability properties in trees are decidable– Monadic Second-Order Logic over Trees– existing MONA decision procedure

• construct a tree automaton for each formula• check emptiness of the language of automaton

rightleftUsing this approach: We can analyze implementations of treesBut only trees.Even parent links would introduce cycles!

Beyond trees

MONA CVC Litefield

constraintanalysis


properties






SVV'05VMCAI'06

CADE'05, JAR


VMCAI'05



VMCAI’06

Jahobdata structure

analysis system


Hob data structure

analysis system

Field constraint analysis

Enables reasoning about non-tree fields

Can handle broader class of data structures– doubly-linked lists, trees with parent pointers– skip lists

nextnext next

nextSub

next next

nextSub

treebackbone

constrainedfields

Constrained fields satisfy constraint invariant: ALL x y. nextSub(x) = y next+(x,y)

Elimination of constrained fields

MONAfield

constraintanalysis

VMCAI'06

nextnext next

nextSub

next next

nextSub

treebackbone

constrainedfields


VC1(next,nextSub)VC2(next)

valid valid soundness

invalid invalidcompleteness

(for useful class including preservation of field constraints)

Elimination of constrained fields

nextnext next

nextSub

next next

nextSub

treebackbone

constrainedfields


Previous approaches– constraining formula must be deterministic

We allow arbitrary constraint formulas– fields need not be uniquely given by backbone

Inferring invariants

MONA CVC Litefield

constraintanalysis


properties






SVV'05VMCAI'06

CADE'05, JAR


VMCAI'05



VMCAI’06

Jahobdata structure

analysis system


Hob data structure

analysis system

would need loop invariants

Loop invariant synthesis

root c

Possible states at entry to List.remove(c)

root

root c

c

Problem: unbounded number of objects

Solution: partition objects into sets

. . .

Partitioning with reachability

! Proot & ! Pc & ! Rc

root c

Partitioning properties of objects:Proot – pointed to by rootPc – pointed to by cRc – reachable from c

Group nodes according to whether properties hold

Proot ... Pc... ! Pc & Rc ...

abstract heap (represents unbounded number of concrete heaps)

. . . . . .

c

Pc... ! Pc & Rc ...

. . .

8 x. (Proot(x) & !Pc(x) & !Rc(x)) |

!Proot(x) & !Pc(x) & !Rc(x)) |

!Proot(x) & Pc(x) & Rc(x)) |

!Proot(x) & !Pc(x) & Rc(x)))

root

| 8 x. (Proot(x) & Pc(x) & !Rc(x)) |

!Proot(x) & !Pc(x) & Rc(x)))

Domain for inferring loop invariants

Çc 8 x. Çb Æa Pa(a,b,c)(x)

a summary node

partitioning properties and their negations (Rx, !Rx)

abstract heap

! Proot & ! Pc & ! Rc

C1C2 C3 C4

set of possible abstract heaps at a given program point

! Proot & ! Pc & ! RcProot PcPOPL’02: graph-basedSAS’03: undecidabilityVMCAI’04: formulasSAS’05 (Podelski, Wies)

. . . ...

! Pc & Rc

Domain for inferring loop invariants

Çc 8 x. Çb Æa Pa(a,b,c)(x)

! Proot & Rroot & ! Px & ! RxProot Pc

. . . ...

! Pc & Rc

Compared to predicate abstraction

Çb Æa Pa(a,b)

– predicates on object x and state, not just state– enables needed precision and efficiency

Propagating abstract heaps

n = c.next

p = c.prev

initial heaps

Finite state space - explore using a worklist algorithm

. . .

How to compute if heap is a successors?

F1

F2

Use verification condition generator!

Computing transitions

verification condition generator

Bohne analysis

Decision procedure

invariant synthesis

F1 wlp(F2)

F1 , basic block , F2

valid

transition F1 F2

is possible

Making invariant synthesis feasible

Naive algorithm: 2^2n queries

Reducing number of queries– transform each summary node independently

(Cartesian abstraction)– avoid recomputation

• precompute abstractions of transitions(generalization of Boolean programs)

• precompute unsatisfiable conjunctions• ‘semantic’ caching of queries

– auxiliary analysis to propagate true conjuncts

Improvements crucial for making analysis feasible

Analyses Developed in Hob

MONA CVC Litefield

constraintanalysis


properties






SVV'05VMCAI'06

CADE'05, JAR


VMCAI'05 invariant inference algorithms

VMCAI’06

Jahobdata structure

analysis system


Hob data structure

analysis system

1 line / sec

depends on graduate student10 line / sec

100 lines / secusing MONA

(but could use SAT)

Outline




Current status– analyzed programs– ongoing work

Future work

Minesweeper experience

init

true

isExposed

false

next

prev

next

prev

next is acyclic

prev is inverse of next

object is in hidden cells list iff initialized and isExposed is false

Verified properties meaningful to designers and end users

disjoint(Hidden.content, Exposed.content)

“A cell is never both hidden and exposed”– consistency needed to understand the game

! disjoint(Mined.content,Exposed.content) => gameOver

“If a mined cell is exposed, the game is over”– defining property of the game

proc remove(n : Node)

requires n in Content & n != null

ensures Content’= Content – n

&

}

List with a cursorspec module IterList {

specvar Content : Node set

specvar Iter : Node set;

invariant Iter in Content;

impl module IterList {

var root, current : Node;

proc remove(n : Node) {

if (n==root) { root = root.next; }

Node prv, nxt;

prv = n.prev; nxt = n.next;

if (prv!=null) { prv.next = nxt; }

if (nxt!=null) { nxt.prev = prv; }

n.next = null; n.prev = null;

}

}root current

if (n==current) {

current = current.next; }

Iter ’ = Iter – nIter

Content

BUG

Verifying use of cursors

List.openIter();

bool b = List.isLastIter();

while (!b) {

c = List.nextIter();

View.drawCell(c);

b = List.isLastIter();

}

spec module IterList {

specvar Content, Iter : Node set;

invariant Iter in Content;

proc isLastIter() returns b : bool

ensures b' <=> (Iter ' = {});

proc nextIter() returns n : Node

requires Iter != { }

modifies Iter

ensures (n != null) &

(n in Iter) & (Iter ' = Iter - n) &

(n in Content);

}

iterator initialized before useno iteration past the endeach cell visited exactly once

Further analyzed programsWater particle simulation: ordering of computation phases

Web server: initialization, ordering, data structures– serving http://hob.csail.mit.edu

High-level properties– relationships between different data structures– none of individual analysis could handle alone

Individual data structures: – trees (w/ parents), doubly-linked lists (w/ cursors)– skip lists, lists with cross pointers, array, priority queue

Ongoing work:– turn-based strategy game, collection classes– operating system data structures

Jahob system

MONA CVC Litefield

constraintanalysis


properties






SVV'05VMCAI'06

CADE'05, JAR


VMCAI'05



VMCAI’06

Jahobdata structure

analysis system


Hob data structure

analysis system

Jahob system

Successor to Hob

Goal: check data structures in more scenarios– richer interfaces and invariants

• maps to specify association lists, hash tables• relations to specify unbounded number of instances• symbolic cardinality constraints on sets

– future extension to other properties

Implementation language: Java subset

Specification language: Isabelle subset

New specialized decision procedures

Fine-grained combination of logics

MONA CVC Litefield

constraintanalysis


properties






SVV'05VMCAI'06

CADE'05, JAR


VMCAI'05



VMCAI’06

Jahobdata structure

analysis system


Hob data structure

analysis system

Relational interfaces: impl and use

Logics for verifying uses of relations– two variable logic with counting (SAS’04)– fragments of first-order logic (AIOOL’05)

Book

Person

borrows = {(1, A), (2, B), (3, B)}

[0..4]

[0..1]

1 2 3

A B

use multiple logics for each verification condition in implementation of relation

New high-level analysis

modular methodology that supports OO style

New decision procedures

MONA CVC Litefield

constraintanalysis


properties






SVV'05VMCAI'06

CADE'05, JAR


VMCAI'05



VMCAI’06

Jahobdata structure

analysis system


Hob data structure

analysis system

BAPA: Sets with cardinality bounds

Imposing constraints on abstract content

card(content) = size

card(a.content) = card(b.content)

next nextfirst

3

size size field is consistent withthe number of stored objects

Boolean Algebra with Presburger Arithmetic

Not widely known, but natural extension of BAs

Gave first complexity bound (CADE'05, JAR)– quantifier elimination algorithm (as in LICS’03)

Recent results (see technical report, submissions):– first PSPACE algorithm for quantifier-free fragment

– identified new useful polynomial-time fragment

S ::= V | S1 [ S2 | S1 Å S2 | S1 n S2

T ::= k | C | T1 + T2 | T1 – T2 | C¢T | card(S)

A ::= S1 = S2 | S1 µ S2 | T1 = T2 | T1 < T2

F ::= A | F1 Æ F2 | F1 Ç F2 | :F | 9S.F | 9k.F

From BAPA to PA

If A,B are disjoint, then |A [ B| = |A| + |B|

Make them disjoint: Venn diagram

Reduce set vars to integer varsFor quantifiers, use quantifier eliminationPreserves alternations elementary

2 3

6

1

4 |xc Å y Å zc|

x y

z

58

Quantifier-free BAPA

Previous technique gives NEXPTIME

Can do it in PSPACE:– analyze resulting equations

exponentially many variables polynomially many equations finite model property: solutions singly exp.

– guess sizes of sets– use alternating PTIME algorithm to check them

Also identified a tree-like fragment, in PTIME

Hob and Jahob systems

MONA CVC Litefield

constraintanalysis


properties






SVV'05VMCAI'06

CADE'05, JAR


VMCAI'05



VMCAI’06

Jahobdata structure

analysis system


Hob data structure

analysis system

Future work: roadmap

So far: conformance of code to model (specification)

Next: address the construction of models– counterexamples for models (Alloy, FSE’05)– testing, run-time checking of specifications– efficient execution of declarative specifications

Fostering adoption of specifications– inference, syntax, quantifiers, defaults, templates

Deploy within software development environments

Integrate domain-specific knowledge– operating systems, games, embedded systems

Related work

Comparison to modular set-based analysis– we are similarly

• modular (but contracts: mutation,heap vs higher-order)• use sets and relations (of objects, not terms) note: LICS’03

– we also have• data abstraction: public and private contracts, abst funs• flow sensitivity, mutation: typestate• shape properties: relationships between typestates• different analyses in different modules

– but so far no: higher order functions, contract inference, two-level constraints, IDE

Related workShape analysis

– Jones, Muchnik ’79: memory optimizations– Larus, Hilfinger’88: detecting conflicts in memory accesses– Hendren, Nicolau ’90: parallelization, connection analysis– Chase, Wegman, Zadeck’90: allocation-site model– Klarlund, Schwartzbach’93: graph types– Deutsch ’94: symbolic bounds on paths– Fradet, Metayer ’97: graph-grammars– Sagiv, Reps, Wilhelm ’99: 3-valued framework– Lev-Ami, Sagiv ’00: TVLA implementation– Moeller, Schwartzbach ’01: PALE based on MONA– Yorsh, Reps, Sagiv ’04: assume/guarantee reasoning for 3VL– McPeak, Necula ’05: local pointer properties– Rugina, Hacket’05: region-based– Lee, Yang, Yi’05: combining three-valued and grammar-based

Related workModel checking:

– Holzmann ’97: SPIN– Burch, Clarke, Long, McMillan, Dill ’92: SMV– Pisman, Pnueli ’01: non-regular infinite state systems

Predicate abstraction – extracting models– Graf, Saidi ’97: using PVS– Ball, Podelski ’01: Cartesian abstraction– Ball, Majumdar, Millstein, Rajamani ’01: SLAM– Henzinger, Jhala, Sutre’02: BLAST– Flanagan, Qadeer ’02: use of Skolem constants– Lahiri, Seshia, Bryant ’04: UCLID, indexed predicates– Balaban, Pnueli, Zuck ’05: small models for lists– Bingham, Rakamaric’06: abstraction of lists– Lahiri, Qadeer ’06: lists and data properties

Related workDecision procedures and theorem provers

– Barrett, Berezin’04: CVC Lite– Detlef, Nelson, Saxe’03: Simplify– Ball, Lahiri, Musuvathi ’05: Zap– Thatcher, Wright’68: MSOL over finite trees– Klarlund, Moeller, Schwartzbach’00: MONA– Yorsh, Rabinovich, Sagiv, Meyer, Bouajjani’06: reachability logic– BAPA: Feferman,Vaught’59; Zarba’04,’05– Voronkov’95: Vampire, Weidenbach’01: Spass– Gordon’85: HOL, Pfenning’91: LF, Coquand, Huet’85: Coq– Constable, Allen, Bromley, Cleaveland, Cremer, Harper, Howe, Knoblock,

Mendler, Panangaden, Sasaki, Smith’86: NuPRL– Gray, Hickey, Nogin, Tapus: MetaPRL– Kaufmann, Manolios, Moore ’00: ACL2– Nipkow, Paulson, Wenzel’02: Isabelle

Related workProgram verification systems

– King ’70, Deutsch’73, Suzuki’73, Nelson’81, Guttag, Horning’93– Good, Akers, Smith ’86: Gypsy– Jones’86: VDM– Abrial, Lee, Neilson, Scharbach, Soerensen’91: B method– Owre, Shankar, Rushby, Stringer-Calvert: PVS– Ahrendt, Baar, Beckert, Giese, Habermalz, Haehnle, Menzel,

Schmitt’00: KeY– Foulger, King’01: SPARK Ada– Flanagan, Leino, Lilibridge, Nelson, Saxe, Stata‘02: ESC/Java– Marche, Paulin-Mohring, Urbain’03: Krakatoa– Breunesse, Poll’05: model fields in JML– Barnett, DeLine, Jacobs, Fähndrich, Leino, Schulte, Venter’05: Spec#– Leino, Mueller’06: model fields in Spec#

Conclusions

Goal: statically verify data structure consistency

Hob system: language, framework, analyses– specification language based on sets– new shape analysis, new high-level analysis– analyzed minesweeper, water, web server

• detailed data structure properties: trees, arrays, ...• properties meaningful to users of the system

Jahob system– richer specification language with relations– new decision procedures and analyses

Related work

Array bounds checking– Bodik, Gupta, Sarkar ’00: demand-driven– Rugina, Rinard ’00: bounds and region analysis

Pointer analyses– Steensgaard ’96: points-to in almost linear time– Andersen’94: inclusion constraints– Fähndrich, Rehof, Das ’00: instantiation constraints– Salcianu, Rinard ’05: side-effect analysis– Sridharan, Gopan, Shan, Bodik ’05: demand-driven– Sridharan, Bodik ’06: refinement-based

Cost of analyzing data structures

Doubly exponential state space

Non-elementary decision procedure

Mutable reversal of list using a loopcontent ’ = content, structure remains acyclic list5 seconds

2-level skip list insertioncontent ’ = content [ {x}, structure remains skip list35 seconds

Insertion into parent treecontent ’ = content [ {x} , structure remains parent tree 83 seconds

Related work

Type systems– Freeman, Pfenning ’91: refinement types– Xi, Pfenning ’99: dependent ML– Harren, Necula’05: dependent types in typed assembly– Smith, Walker, Morrisett ’00: alias types

Typestate systems– Strom, Yemini ’86: typestate for initialization– Fahndrich, DeLine ’01 ’04: finite state protocols– Das, Lerner, Seigle ’02: typestate inference– Ramalingam, Warshavsky, Field, Goyal, Sagiv ’02

Related workBug finding and dynamic specification synthesis

– Jackson, Vaziri ’00: finding bugs in code with Alloy– Taghdiri ’04: counterexample-driven refinement– Xie, Aiken ’05: Saturn– Evans ’94: LCLint– Engler, Musuvathi’00: metacompilation– Hovemeyer, Pugh ’04: FindBugs– Boyapati, Khurshid, Marinov ’02: Korat– Sen, Marinov, Agha: CUTE– Ernst, Czeisler, Griswold, Notkin’00: dynamic invariant inference– Ammons, Bodik, Larus ’02: dynamic finite state inference

Additional details and topics

Decidability of structural subtyping

Relational reasoning about datatypes

Two-variable logic and spatial conjunction

Boolean algebra with Presburger arithmetic

High-level analysis using set algebra

modular static analysis with sets and relations for verifying data structure consistency

Documents