integrating model checking and procedural languages
TRANSCRIPT
Integrating Model Checking and Procedural Languages
David Owen
July 19, 2004
2Overview
• Background: verification / search tools, criteria for when to use which tool, combining different strategies.
• Experiments: flight guidance system, leader election protocol, dining philosophers, resource arbiter.
• Implementation: Lurch, our random simulation tool for finite-state models.
• Lean: Lurch + machine learning.• Lean experiment: Chemical factory optimization.
3A Continuum of Testingand Verification Tools
• A range of tools exists, from traditional software testing to automated verification.– Simulation tools that approximate full verification but
work on more complex models.– Sophisticated testing tools capable of detecting more
complex errors.
Autom
ated
Verific
atio
n
Tradi
tiona
l
Softw
are
Testin
g
Complex ModelsSimple Errors
Simple ModelsComplex Errors
Tools to ApproximateFull Verification
More SophisticatedTesting Tools
RealLanguages
Model Checking
4Changing Expectationsof a Software Analyst
• Cobleigh et.al. idea—three modes of analysis.– Exploratory mode: quick feedback needed to learn
how the system works and refine properties.– Fault-finding mode: short and clear error traces
needed for debugging.– Maintenance mode: completeness, scalability
needed to verify overall system.
• Different tools have different strengths.– Simulation tools good for exploratory mode.– Symbolic model checking good for short error traces.– Explicit-state model checking good for speed and
scalability.
5Combining Complimentary
Strategies• Different tools have different strengths and
weaknesses.– Cobleigh et.al. suggest “The Right Algorithm at the
Right Time” (ICSE 2001).– We’ve had some success with a different approach,
combining complimentary strategies (regardless of analyst’s mode).
– Start with a quick, incomplete tool; if no errors found after a few seconds use a model checker (complete verification).
Quick, Incomplete Search Model Checker
No ErrorsFound
Done
Errors Found
6Random Simulation of
Concurrent System Models
• Randomized algorithms known to be simple, fast and effective in many domains.
• West used random simulation to detect errors in concurrent system models.– This approach was surprisingly successful.– Success was attributed to the fact that most errors
detected are much less complex than the overall system.
• We have implemented a similar random simulation in a tool called Lurch.– Added early stopping heuristics.– C code can be included in the model.
7Flight Guidance System
Experiment
• Work with Mats Heimdahl and Jimin Gao (University of Minnesota).
• Ran Lurch, NuSMV on model representing mode logic from a Rockwell-Collins flight guidance system.– Seeded faults based on developers’ revision history.– Used NuSMV to (exhaustively) determine what
properties were violated by faulty specifications.– Tried to find the violations with Lurch (random
simulation of the model).– Put Lurch and NuSMV results together to evaluate
combined strategy.
8Flight Guidance System
Experiment (2)
5,9103.92
141,000
14,00014,00027,600
12,2003,890
141,000
1.491.034.43
Combined average median max
8,2003,540
141,000
14,00014,00027,600
12,2003,890
141,000
4,3803,290
17,500
NuSMV average median max
55340.15,400
1.491.034.43
Lurch average median max
OverallLurch ?Lurch > 5Lurch < 5
Property violations not detected by Lurch
Combined strategyimproves averageby over ½ hour.
Time (seconds) to verify or find error plotted;combined = Lurch for 5 sec., then SPIN if noproperty violations found by Lurch.
9Leader Election Protocol
Experiment• Protocol published as an example for SPIN
(Holzmann 1997 TSE article).• N processes communicating via message
queues interact to choose one leader process.• Checked for liveness property
always(eventually(one “leader” chosen)).• Ran Lurch + SPIN combination strategy on
original and two fault-seeded versions of the model.– Seeded faults: where a process is sending out a
message, the wrong message type was used.– Two different fault-seeded versions created: one that
turned out easy, another that turned out harder.
10Leader Election Protocol
Experiment (2)
20.40.173249
20.40.183195
0.1370.1280.173
54.29.67249
Combined average median max
23.40.125244
31.23.21190
0.0590.0550.08
49.24.67244
SPIN average median max
1.600.1837.19
0.1370.1280.173
Lurch average median max
OverallFault 2Fault 1Correct
Although SPIN aloneis better on the correctand first fault-seededversions, average forcombined strategy isstill better overall.
Time (seconds) to verify or find error plotted;combined = Lurch for 5 sec., then SPIN if noproperty violations found by Lurch.
11Leader Election Protocol
Experiment (3)
• This plot shows the time required for Lurch and SPIN running on a model with both of the seeded faults described previously.– Instances with an odd number of processes are much more difficult for
SPIN, but not for Lurch.– This demonstrates a well-known benefit of some randomized
algorithms: less sensitivity to (apparently) minor changes in the input.
12Dining Philosophers
Experiment
• Two different versions of the problem:– Normal: n philosophers seated around a table; each
repeatedly tries to acquire left and right forks, eat, and then set down the forks.
– No loop: same as normal version, except philosophers only try to eat once.
• Both versions of the problem contain two deadlocks at depth n.
• We ran Lurch, SPIN and NuSMV, until the shortest path to a deadlock was found.
• The normal version was harder for NuSMV and Lurch; the no-loop version was harder for SPIN.
13
Dining PhilosophersExperiment (2)
350.135555
0.2810.0631.19
69.80.223555
Combined (NuSMV) average median max
46.33.07550
4.992.1219.4
87.55.15550
NuSMV average median max
2.560.13534.9
0.2810.0631.19
4.830.22334.9
Combined (SPIN) average median max
19.50.49236
340.741236
4.990.4729.9
SPIN average median max
0.8060.1356.83
0.2810.0631.19
1.330.2236.83
Lurch average median max
OverallNo LoopNormal
In both cases, the combinedstrategy (Lurch + SPIN orLurch + NuSMV) saves time.
Time (seconds) to find shortest pathplotted; combined = Lurch for 5 sec.,then SPIN if no property violationsfound by Lurch.
14Lurch Input Models:
C Code + Finite-State Machines
• Lurch transitions may refer to arbitrary C code.• For example, we could use a C variable for the turn
variable in our producer-consumer model:
enum {P,C} turn = P;
%%
pr_wait; (turn==P); -; produce;produce; -; {turn=C;}; pr_wait;
cs_wait; (turn==C); -; consume;consume; -; {turn=P;}; cs_wait;
Parenthesis and brackets within transitions markreferences to C expressions and statements.
%% separates Cand finite-statemachines.
Each finite-statemachine is alist of transitions.
15RA-RRE Model
• Work with John Powell (NASA JPL).• Resource arbitration (RA) system on board a
robotic remote exploration (RRE) vehicle– User processes make requests for RRE resources
through a message queue. – User processes run concurrently with an arbiter
process, which responds to requests in the queue.– Arbiter will Grant, Deny, Pend, Rescind or Deny and
Rescind a resource request.– Abiter filters out nonsense messages and ignores
them.
16RA-RRE Model (2)
• Large Stateflow® model:– C code embedded inside states to represent complex internal
system behaviors.– JPL’s HiVy translator used to generate Promela (SPIN’s input
language) with embedded C code.– Translated from Stateflow® to Lurch with C code references in
transitions.– While it can be very difficult to correctly use Promela’s C code
embedding features, Powell reports that it was not difficult to use C code in Lurch models, even after just 15 hours of informal training.
• Lurch results matched SPIN’s, finding deadlocks in six different versions of the model.– Different versions created by running HiVy translator with or
without various optimizations, and running models with minor fixes put into the code.
17RA-RRE Model (3)
Easily instrumented to provide visibility into embedded C code errors. This led to discovery of error relating to fundamental system specification conflicts.
Masked errors in embedded C code as syntactic / semantic problems embedding C into Promela.
Diagnosis of Error Causes
Easily accomplished with minimal training.
Steep learning curve.Embedded C Code
Found multiple variations on deadlock over properties.
Model too large to verify properties.
Finding Errors—Property Violation
Found DeadlockFound DeadlockFinding Errors—Deadlock
LurchSPIN
• Powell’s conclusion: compared to SPIN, Lurch easy to use for models with embedded C code; Lurch found same errors consistently.
18Lurch Implementation
step(Q, state) while (Q not empty) tr := pop(Q) exec_outputs(tr, state)
for (tr' in same machine as tr) del(Q, tr')
check(state) fault_check(state) deadlock_check(state) cycle_check(state) search(iterations, depth) for (i in iterations) for (m in machines) state[m] = 0 for (d in depth) for (tr in transitions) if (check_inputs(tr)) random_push(Q, tr)
step(Q, state) check(state)
• Lurch’s partial, random search procedure:– Partial: there is no guarantee that
all behavior will be explored.
– Random: the choice of which behavior to explore is nondeterministic.
The basic search procedure repeated each time tick.
Each iteration explores one global state path through the behavior of the system. A path is divided into “time ticks.” At each time tick a state vector (with a value for each machine) is updated.
19Lurch Implementation (2)
• The step function is called at each time tick along a global state path.
• Input is a queue of transitions whose inputs are satisfied, along with the state vector.
• Transitions are popped from the queue, and their outputs are executed.
• The effect of transitions executed is stored in the state vector.
• Only one transition from each machine can be executed at each time step; others are discarded from the queue.
20
• With the step function as-is (as described in the previous slide), Lurch simulates synchronous execution of finite-state machines: at each time step, every machine is given a chance to move forward.
• If the step function is modified so that only one transition (one out of all the machines) is executed at each time step, Lurch simulates asynchronous execution of the system: all interleavings of machine behaviors are considered.
asynchronous
synchronous
Lurch Implementation (3)
state = < 1, 1, 1 >
state = < 0, 0, 0 >
state = < 1, 1, 1 >
state = < 1, 1, 0 >
state = < 1, 0, 0 >
state = < 0, 0, 0 >
21Lurch Implementation (4)
• At each time tick along a path Lurch checks for local-state faults, deadlocks and cycles.
• Local state faults can be found directly from the state vector—if one of the machines is in a state corresponding to a fault, Lurch reports that the fault was reached.
• A deadlock occurs when Lurch reaches the end of a global state path (a state for which no new transition’s inputs are satisfied) but not all machines are in a state identified as a legal end state.
• Deadlocks are found by looping through the state vector to make sure all local states are legal end states (this is done only when Lurch is at the end of a global state path).
22Other Applications for
Lurch’s Random Simulation
• Game playing experiments: n-queens, tic-tac-toe• Lurch is really a fast generator of consistent
temporal sequences—so what else can we use it for?
• If we generate a score for each temporal sequence, we can use a machine learner to suggest what makes some sequences better than others.
• Lurch + Machine Learning = “Lean,” a randomized heuristic search tool for finite-state models (with optional C code).
23Lean: Combining“Test” and “Task”
• Traditional view: specialized devices for different tasks.– Diagnosis, configuration, testing...
• Alternative: one environment where “test” and “task” are implemented together:– Write down what is known about a domain.– Add an oracle to score a single run (i.e., score the temporal
sequences generated by Lurch).– Instead of different devices for “test” and “task”– “Lean” = Lurch + learn
• Run Lurch on sample space of options.
• Learn—apply machine learning to find “nudges,” which are suggestions for which transitions lead to runs with higher scores.
• Apply “nudges” in the form of transition probabilities, and run Lurch again, expecting better scores.
24Chemical Factory (Lean)
• Work with Tom Burkleau, Portland State University.
• Finite-state machine model of commercial vodka distillery plant.
• Multiple machines representing the space of options, the model of the production facility, and the relation between production parts.
Nominal Model (composite) Faulty Model (composite)
25Optimizing Nominal Model
After 7 scored runs of Lurch, plusmachine learning to find “nudges”:
26
• 26 repeats of <LURCH,learn>
• Change learning classes: – Class1: fixed– Class2: movable– Learn selectors for class2
• Negate them (removes the bug)– 1 more repeat of <LURCH,learn>
• Question: is this simulation or optimizationor parameter tuning or fault localization ordiagnosis or configuration?
• Answer: all of the above
Optimizing Faulty Model
Gone!
Fixed, refuses to budge
27Conclusion
• Combination and model checking of random simulation (Lurch) (SPIN or NuSMV) can be faster and more efficient than model checking alone, without sacrificing completeness.– FGS (Heimdahl, Gao at UMN), leader election protocol, dining
philosophers experiments.
• Lurch allows (easy-to-use) references to arbitrary C code.– RA-RRE model experiments (Powell at JPL).
• Lurch uses a simple random search procedure, plus early stopping heuristics and modifications for asynchronous models, hierarchical models, etc.
• Lean = Lurch + machine learning.– Chemical factory optimization experiment (Burkleau at PSU).