formally verifying a file system: a successful failure csci-p515/p415 spring 2008 michael adams...

Formally Verifying a File System: A Successful Failure

CSCI-P515/P415Spring 2008

Michael Adams (adamsmd@cs.indiana.edu)Joseph Near (jnear@cs.indiana.edu)

Aaron Kahn (aakahn@cs.indiana.edu)

OverviewOverview

MotivationHigh Level DesignApproachMinor Difficulties (and their solutions)Major (Fatal) Difficulty (and

explanation)The Proposed SolutionRecap/Summation

MotivationMotivation

Our goal for this project was to attempt to formally verify a file system◦We were under the impression that this

would be a straight forward task, and as long as the abstraction was simple, there wouldn't be any major problems

LimitationsLimitations

Are doing:◦Can take a file number, and read/write to

it◦Create/Delete files

Not doing:◦Directories◦File Names◦Permissions, Users, Groups, etc

The stuff we're not doing can be added as an abstraction on what we are doing

DesignDesign

Develop a B-tree Structure◦The B-tree is actually serialized onto a

disk Disk represented as an array of bytes

Create the B-tree algorithms ◦insert, delete, lookup

Write the File System (read file, write file, create file, etc) algorithms in terms of the B-tree algorithms.

ProcessProcess

Initially, we wrote the code in Scheme in order to have a fully working model of “live” code to test on, and then translated it in to PVS

In PVS, the file system was abstracted all the way down to a disk representation to allow for better simulation of real problems of writing file systems◦This turned out to be essential to our learning

the difficulties of actually verifying a file system

Additional StructuresAdditional Structures

In addition to the B-tree, we found that these auxiliary structures were needed◦A free list◦Blocks that represent files themselves,

but are not part of the B-tree◦Single block that holds all of the pointers

to the root of the free list and the root of the B-tree (similar to a meta-data block)

AccomplishmentsAccomplishments

B-tree in Scheme◦Thoroughly tested

Were able to successfully translate our code into PVS.

Made a number of discoveries in terms of tricks for proving the algorithms in PVS◦However, very late in the game, we

discovered a fatal limitation of how we modelled things in PVS Have ideas for overcoming the problems in the

future

Minor ProblemsMinor Problems

In a large project, there are many minor problems that are surprisingly difficult to solve

These often require the development of a simple but non-obvious trick

We ran into and solved many of these; here is a sample of what we learned◦More detail included in report

SearchSearchsearch(array, start, stop, val)Search through a sorted array for the

first value greater than or equal to the argument; return the position of that value

If no element is greater than the argument, return the length of the array

Unexpectedly difficult to proveMeasured induction on stop – startEnded up using max(0, stop – start)Lesson: make sure measure is well-

founded; sometimes making it well-founded works

Well-formednessWell-formedness

Designed as part of our testing; believed to be an important part of the proof

Theory: algorithms are correct if they have the desired effect and the disk remains well-formed

Assuming a well-formed disk should give us a basis for proving correctness of our operations

Proved that a newly-formatted disk is well-formed

Partially proved that allocation preserves well-formedness

Well-formednessWell-formedness

Realization: well-formedness is irrelevant!

Well-formedness is defined by the observer (in this case, lookup)◦lookup(key, insert(key, value, disk)) = value

If the observer can correctly interpret the data given to it, then that data is well-formed

Lesson: don't waste time proving things about well-formedness

Proving Proving insertinsert

Many uses of let due to state-passing style◦Exponential blowup of expression size◦Sequents become pages long!

Side effects make proofs difficult◦When an object is effected, the sequent

clauses no longer apply, even if the change doesn't affect them

◦User has to prove that the sequent clauses still apply

Main Problem: Side EffectsMain Problem: Side Effects

State Passing StyleGood for modelling state

◦ Easy to implement, familiarBad for Proving!

The problem with side effectsThe problem with side effects

Effects Invalidate AssumptionsGiven a property about a disk, we

need to prove the same property about a modified disk

Example:◦If P(disk) then P(write_block(block, disk))◦Even if the effect does not affect P, we

have to prove that P still holds◦This makes sense: it does not hold

automatically!

Obvious solution: Hoare Obvious solution: Hoare LogicLogicSubstitution enforces separation of

variables◦So P(x) => P(x) automatically as long as x

isn't effectedRed herring: this only helps if we use

Abstract Data TypesWe serialize our ADT into a single disk

object◦Side-effecting one part will side-effect all

parts, even if we use Hoare Logic

Naive SolutionNaive Solution

Prove that side effecting one part of the B-Tree, Free list, etc doesn't effect assumptions about other parts of the disk

Possible, but Impractical◦For every algorithm

For every effect For every clause of the sequent

Must prove that the assumption still holds after the effect

◦A few such basic proofs were accomplished But even they were long and easy to get lost in

What we want from a better What we want from a better solutionsolutionWe want to write ADT style codeWe want to write ADT style proofsWe want to push a button and have

◦Serialized style code◦Serialized style proofs

Is it Possible???

What a solution would look What a solution would look likelikeSerialization Theorems

◦Example: deserialize(serialize(n)) = n◦Fairly easy to prove

Already done Even grind could do it

Proof that changing one value doesn't effect other values◦Hmm...

Proof of effect independenceProof of effect independence

Language Run-Time for ADT is already doing this◦Objects are serialized to memory

Language Run-Time Limitations◦Language vs Programmer control of

serialization◦The Garbage collector

Known Hard Problem Bad Idea on a Hard Disk

How to avoid GCHow to avoid GC

We don't need general GCSide-effect view:

◦Values only “modified” if only reference Or not reachable from values used in theorems

ADT view:◦Values only “allocated” if we are

“freeing” another valueSolution: ...

Linear Types!!!

What are Linear Types?What are Linear Types?

Objects must always have exactly one reference◦No duplication◦No erasure

No GC needed◦Look Ma, No Garbage!◦“Modifying” something is “de-alloc” plus

“alloc”Our algorithms already treat objects as

linearJust need to teach PVS to take advantage

of that

Linear Types vs. MonadsLinear Types vs. Monads

Lost the battle of representing state to monads

Maybe could win the war for formal proofs

Pros and Cons◦Monads are more General

Non-determinism, environments, etc.

◦Linear Types provide more guarantees A reference to a linearly typed object is

guaranteed to be the only reference

RecapRecap

File Systems are Full of Bugs◦But it is critical that they be right◦Verification could fix this

We designed and implemented a File System◦B-Tree based◦Modelled all the way to “disk”◦Auxiliary structures needed

Free List File Blocks Root File System Block

RecapRecap

We proved linear search◦Lesson: Make sure measures are well-founded◦Lesson: Make measures well-founded if they

aren'tWell-formedness

◦Red-herring◦Actually defined by observers

Exponential blow-up due to let◦Possible Improvement in how PVS presents

sequents

RecapRecap

Side effects are hard in an unexpected way◦Implementing side effects in PVS is easy

Use State Passing Style (e.g. State monad)

◦Proving side effects in a serialized common store is hard Must prove that every effect keeps the

theorems trueNumber of Proofs exploded beyond

our ability

RecapRecap

Linear Types to the Rescue!!◦User writes ADT style proofs◦System converts them to serialized

proofs◦Better than Monads

Need Theory for Linear Types in PVS

Final ResultsFinal Results

Ultimately had to declare failure◦Code is fragmentary

But learned more from failure than success◦Main deliverable is report and what not to do◦We have good ideas for how to make future

attempts... and we don't feel too bad because

others have estimated verifying a file system to take 2-3 years to accomplish.◦A mini-challenge: build a verifiable filesystem.

Rajeev Joshi, Gerard J. Holzmann

formally verifying a file system: a successful failure csci-p515/p415 spring 2008 michael adams...

btree algorithms

file number

btree structurethe btree

file systemwe

btree similar

btreesingle block

wellformed disk

disk representation

Documents

a scalable framework for the collaborative annotation of...

sjohnson@cs.indiana.edu theschemengineproject

us internal revenue service: p515

integrated collaborative information systems ahmet e. topcu...

wsdl 1.1 overview marlon pierce and geoffrey fox community...

1 e-chemistry and web 2.0 marlon pierce...

portals, portlets, and clients to grid services marlon...

federated service oriented information management ahmet...

web service foundations: wsdl and soap marlon pierce...

e-decider: quakesim tools and products marlon pierce,...

grid computing for real world applications suresh marru...

thesis proposal ali kaplan alikapla@cs.indiana.edu

web service foundations: wsdl and soap. web services...

a component framework for building web science gateways and...

p515-302 f-series

incremental learning of affix segmentationaau.edu.et,...

soap ii: data encoding marlon pierce, geoffrey fox community...

soap i: intro and message formats marlon pierce, geoffrey...

ali kaplan alikapla@cs.indiana.edu advisor: prof. geoffrey...

event-based model for reconciling digital entries thesis...