integrating model checking and procedural languages

Integrating Model Checking and Procedural Languages

David Owen

July 19, 2004

2Overview

• Background: verification / search tools, criteria for when to use which tool, combining different strategies.

• Experiments: flight guidance system, leader election protocol, dining philosophers, resource arbiter.

• Implementation: Lurch, our random simulation tool for finite-state models.

• Lean: Lurch + machine learning.• Lean experiment: Chemical factory optimization.

3A Continuum of Testingand Verification Tools

• A range of tools exists, from traditional software testing to automated verification.– Simulation tools that approximate full verification but

work on more complex models.– Sophisticated testing tools capable of detecting more

complex errors.

Autom

ated

Verific

atio

n

Tradi

tiona

l

Softw

are

Testin

g

Complex ModelsSimple Errors

Simple ModelsComplex Errors

Tools to ApproximateFull Verification

More SophisticatedTesting Tools

RealLanguages

Model Checking

4Changing Expectationsof a Software Analyst

• Cobleigh et.al. idea—three modes of analysis.– Exploratory mode: quick feedback needed to learn

how the system works and refine properties.– Fault-finding mode: short and clear error traces

needed for debugging.– Maintenance mode: completeness, scalability

needed to verify overall system.

• Different tools have different strengths.– Simulation tools good for exploratory mode.– Symbolic model checking good for short error traces.– Explicit-state model checking good for speed and

scalability.

5Combining Complimentary

Strategies• Different tools have different strengths and

weaknesses.– Cobleigh et.al. suggest “The Right Algorithm at the

Right Time” (ICSE 2001).– We’ve had some success with a different approach,

combining complimentary strategies (regardless of analyst’s mode).

– Start with a quick, incomplete tool; if no errors found after a few seconds use a model checker (complete verification).

Quick, Incomplete Search Model Checker

No ErrorsFound

Done

Errors Found

6Random Simulation of

Concurrent System Models

• Randomized algorithms known to be simple, fast and effective in many domains.

• West used random simulation to detect errors in concurrent system models.– This approach was surprisingly successful.– Success was attributed to the fact that most errors

detected are much less complex than the overall system.

• We have implemented a similar random simulation in a tool called Lurch.– Added early stopping heuristics.– C code can be included in the model.

7Flight Guidance System

Experiment

• Work with Mats Heimdahl and Jimin Gao (University of Minnesota).

• Ran Lurch, NuSMV on model representing mode logic from a Rockwell-Collins flight guidance system.– Seeded faults based on developers’ revision history.– Used NuSMV to (exhaustively) determine what

properties were violated by faulty specifications.– Tried to find the violations with Lurch (random

simulation of the model).– Put Lurch and NuSMV results together to evaluate

combined strategy.

8Flight Guidance System

Experiment (2)

5,9103.92

141,000

14,00014,00027,600

12,2003,890

141,000

1.491.034.43

Combined average median max

8,2003,540

141,000

14,00014,00027,600

12,2003,890

141,000

4,3803,290

17,500

NuSMV average median max

55340.15,400

1.491.034.43

Lurch average median max

OverallLurch ?Lurch > 5Lurch < 5

Property violations not detected by Lurch

Combined strategyimproves averageby over ½ hour.

Time (seconds) to verify or find error plotted;combined = Lurch for 5 sec., then SPIN if noproperty violations found by Lurch.

9Leader Election Protocol

Experiment• Protocol published as an example for SPIN

(Holzmann 1997 TSE article).• N processes communicating via message

queues interact to choose one leader process.• Checked for liveness property

always(eventually(one “leader” chosen)).• Ran Lurch + SPIN combination strategy on

original and two fault-seeded versions of the model.– Seeded faults: where a process is sending out a

message, the wrong message type was used.– Two different fault-seeded versions created: one that

turned out easy, another that turned out harder.


Experiment (2)

20.40.173249

20.40.183195

0.1370.1280.173

54.29.67249

Combined average median max

23.40.125244

31.23.21190

0.0590.0550.08

49.24.67244

SPIN average median max

1.600.1837.19

0.1370.1280.173


OverallFault 2Fault 1Correct

Although SPIN aloneis better on the correctand first fault-seededversions, average forcombined strategy isstill better overall.

Time (seconds) to verify or find error plotted;combined = Lurch for 5 sec., then SPIN if noproperty violations found by Lurch.


Experiment (3)

• This plot shows the time required for Lurch and SPIN running on a model with both of the seeded faults described previously.– Instances with an odd number of processes are much more difficult for

SPIN, but not for Lurch.– This demonstrates a well-known benefit of some randomized

algorithms: less sensitivity to (apparently) minor changes in the input.

12Dining Philosophers

Experiment

• Two different versions of the problem:– Normal: n philosophers seated around a table; each

repeatedly tries to acquire left and right forks, eat, and then set down the forks.

– No loop: same as normal version, except philosophers only try to eat once.

• Both versions of the problem contain two deadlocks at depth n.

• We ran Lurch, SPIN and NuSMV, until the shortest path to a deadlock was found.

• The normal version was harder for NuSMV and Lurch; the no-loop version was harder for SPIN.

13

Dining PhilosophersExperiment (2)

350.135555

0.2810.0631.19

69.80.223555

Combined (NuSMV) average median max

46.33.07550

4.992.1219.4

87.55.15550

NuSMV average median max

2.560.13534.9

0.2810.0631.19

4.830.22334.9

Combined (SPIN) average median max

19.50.49236

340.741236

4.990.4729.9

SPIN average median max

0.8060.1356.83

0.2810.0631.19

1.330.2236.83


OverallNo LoopNormal

In both cases, the combinedstrategy (Lurch + SPIN orLurch + NuSMV) saves time.

Time (seconds) to find shortest pathplotted; combined = Lurch for 5 sec.,then SPIN if no property violationsfound by Lurch.

14Lurch Input Models:

C Code + Finite-State Machines

• Lurch transitions may refer to arbitrary C code.• For example, we could use a C variable for the turn

variable in our producer-consumer model:

enum {P,C} turn = P;

%%

pr_wait; (turn==P); -; produce;produce; -; {turn=C;}; pr_wait;

cs_wait; (turn==C); -; consume;consume; -; {turn=P;}; cs_wait;

Parenthesis and brackets within transitions markreferences to C expressions and statements.

%% separates Cand finite-statemachines.

Each finite-statemachine is alist of transitions.

15RA-RRE Model

• Work with John Powell (NASA JPL).• Resource arbitration (RA) system on board a

robotic remote exploration (RRE) vehicle– User processes make requests for RRE resources

through a message queue. – User processes run concurrently with an arbiter

process, which responds to requests in the queue.– Arbiter will Grant, Deny, Pend, Rescind or Deny and

Rescind a resource request.– Abiter filters out nonsense messages and ignores

them.

16RA-RRE Model (2)

• Large Stateflow® model:– C code embedded inside states to represent complex internal

system behaviors.– JPL’s HiVy translator used to generate Promela (SPIN’s input

language) with embedded C code.– Translated from Stateflow® to Lurch with C code references in

transitions.– While it can be very difficult to correctly use Promela’s C code

embedding features, Powell reports that it was not difficult to use C code in Lurch models, even after just 15 hours of informal training.

• Lurch results matched SPIN’s, finding deadlocks in six different versions of the model.– Different versions created by running HiVy translator with or

without various optimizations, and running models with minor fixes put into the code.

17RA-RRE Model (3)

Easily instrumented to provide visibility into embedded C code errors. This led to discovery of error relating to fundamental system specification conflicts.

Masked errors in embedded C code as syntactic / semantic problems embedding C into Promela.

Diagnosis of Error Causes

Easily accomplished with minimal training.

Steep learning curve.Embedded C Code

Found multiple variations on deadlock over properties.

Model too large to verify properties.

Finding Errors—Property Violation

Found DeadlockFound DeadlockFinding Errors—Deadlock

LurchSPIN

• Powell’s conclusion: compared to SPIN, Lurch easy to use for models with embedded C code; Lurch found same errors consistently.

18Lurch Implementation

step(Q, state) while (Q not empty) tr := pop(Q) exec_outputs(tr, state)

for (tr' in same machine as tr) del(Q, tr')

check(state) fault_check(state) deadlock_check(state) cycle_check(state) search(iterations, depth) for (i in iterations) for (m in machines) state[m] = 0 for (d in depth) for (tr in transitions) if (check_inputs(tr)) random_push(Q, tr)

step(Q, state) check(state)

• Lurch’s partial, random search procedure:– Partial: there is no guarantee that

all behavior will be explored.

– Random: the choice of which behavior to explore is nondeterministic.

The basic search procedure repeated each time tick.

Each iteration explores one global state path through the behavior of the system. A path is divided into “time ticks.” At each time tick a state vector (with a value for each machine) is updated.

19Lurch Implementation (2)

• The step function is called at each time tick along a global state path.

• Input is a queue of transitions whose inputs are satisfied, along with the state vector.

• Transitions are popped from the queue, and their outputs are executed.

• The effect of transitions executed is stored in the state vector.

• Only one transition from each machine can be executed at each time step; others are discarded from the queue.

20

• With the step function as-is (as described in the previous slide), Lurch simulates synchronous execution of finite-state machines: at each time step, every machine is given a chance to move forward.

• If the step function is modified so that only one transition (one out of all the machines) is executed at each time step, Lurch simulates asynchronous execution of the system: all interleavings of machine behaviors are considered.

asynchronous

synchronous

Lurch Implementation (3)

state = < 1, 1, 1 >

state = < 0, 0, 0 >

state = < 1, 1, 1 >

state = < 1, 1, 0 >

state = < 1, 0, 0 >

state = < 0, 0, 0 >

21Lurch Implementation (4)

• At each time tick along a path Lurch checks for local-state faults, deadlocks and cycles.

• Local state faults can be found directly from the state vector—if one of the machines is in a state corresponding to a fault, Lurch reports that the fault was reached.

• A deadlock occurs when Lurch reaches the end of a global state path (a state for which no new transition’s inputs are satisfied) but not all machines are in a state identified as a legal end state.

• Deadlocks are found by looping through the state vector to make sure all local states are legal end states (this is done only when Lurch is at the end of a global state path).

22Other Applications for

Lurch’s Random Simulation

• Game playing experiments: n-queens, tic-tac-toe• Lurch is really a fast generator of consistent

temporal sequences—so what else can we use it for?

• If we generate a score for each temporal sequence, we can use a machine learner to suggest what makes some sequences better than others.

• Lurch + Machine Learning = “Lean,” a randomized heuristic search tool for finite-state models (with optional C code).

23Lean: Combining“Test” and “Task”

• Traditional view: specialized devices for different tasks.– Diagnosis, configuration, testing...

• Alternative: one environment where “test” and “task” are implemented together:– Write down what is known about a domain.– Add an oracle to score a single run (i.e., score the temporal

sequences generated by Lurch).– Instead of different devices for “test” and “task”– “Lean” = Lurch + learn

• Run Lurch on sample space of options.

• Learn—apply machine learning to find “nudges,” which are suggestions for which transitions lead to runs with higher scores.

• Apply “nudges” in the form of transition probabilities, and run Lurch again, expecting better scores.

24Chemical Factory (Lean)

• Work with Tom Burkleau, Portland State University.

• Finite-state machine model of commercial vodka distillery plant.

• Multiple machines representing the space of options, the model of the production facility, and the relation between production parts.

Nominal Model (composite) Faulty Model (composite)

25Optimizing Nominal Model

After 7 scored runs of Lurch, plusmachine learning to find “nudges”:

26

• 26 repeats of <LURCH,learn>

• Change learning classes: – Class1: fixed– Class2: movable– Learn selectors for class2

• Negate them (removes the bug)– 1 more repeat of <LURCH,learn>

• Question: is this simulation or optimizationor parameter tuning or fault localization ordiagnosis or configuration?

• Answer: all of the above

Optimizing Faulty Model

Gone!

Fixed, refuses to budge

27Conclusion

• Combination and model checking of random simulation (Lurch) (SPIN or NuSMV) can be faster and more efficient than model checking alone, without sacrificing completeness.– FGS (Heimdahl, Gao at UMN), leader election protocol, dining

philosophers experiments.

• Lurch allows (easy-to-use) references to arbitrary C code.– RA-RRE model experiments (Powell at JPL).

• Lurch uses a simple random search procedure, plus early stopping heuristics and modifications for asynchronous models, hierarchical models, etc.

• Lean = Lurch + machine learning.– Chemical factory optimization experiment (Burkleau at PSU).

integrating model checking and procedural languages

Documents

lurch random simulation

lurch combined strategy

verification tools

spin average median

overall system

combined average median

simulation tools good

verification search