Verification Based on Run-Time, Field-Data, and Beyond
Séverine Colin Laboratoire d’Informatique (LIFC) Université de Franche-Comté-CNRS-INRIA
Leonardo Mariani Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo)
Università di Milano Bicocca
Tope Omitola Computer Laboratory
University of Cambridge, UK
2/36
Outline
Traditional Run-Time Verification Techniques– checking properties on execution data at run-time
Test and Verification Techniques based on Field-Data
– gathering execution data to increase effectiveness of (off-line) test and verification techniques
Discussion on Test, Verification and Model-Checking Conclusions
3/36
Run-Time Verification Techniques
Basic idea : to extract an execution trace of an executing program and to analyze it to detect errors
To check classical error pattern (data races, deadlock)
To verify a program against formal specification
4/36
Data races detection
Data race: two concurrent threads access a shared variable and at least one access is a write in same time
Eraser tool dynamically detects data races To enforce every shared variable is protected
by some lock Eraser algorithm is used by PathExplorer,
Visual Thread
5/36
Deadlock Detection
Deadlock: to occur whenever multiple shared resources are required to accomplish a task
A model representation of the program is constructed during the program execution
Deadlock: circularity in the dependency graph
Used by VisualThread and PathExplorer
6/36
Monitoring and Checking (MaC)
System requirements are formalised Monitoring script is constructed:
– to instrument the code– to establish a mapping from low-level information
into high-level events
At run-time, generated events are monitored for compliance with the requirements specification
7/36
MaC: Events and Conditions
Events occur instantaneously during the system execution
Conditions are information that hold for a duration of time
Three-valued logic: true, false, undefined PEDL (Primitive Event Definition Language):
language for monitoring scripts MEDL (Meta Event Definition Language):
language for safety requirements
8/36
PathExplorer (1/2)
Instrumentation module (using Jtrek): it emits relevant events
An interaction module: send events to observer module
An observer module: it verifies the requirement specification
9/36
PathExplorer (2/2)
Requirements are written using past LTL (Monitoring operators are added: ↑F, ↓F, [F,F)S, [F,F)w
Use the recursive nature of past time temporal logic: the satisfaction relation for a formula can be calculated along the execution trace looking only one step backwards (see our paper for the algorithm)
10/36
T&V Techniques based on Field-data
Field-data: “run-time data collected from the field” Why collecting field data for Test and Verification?
– limited knowledge about the final system, e.g., sw components are usually developed in isolation,
assembled with third-party components and, finally, deployed in unknown environments
– uncertainty of the final environment e.g., in the case of ubiquitous computing, pervasive computing,
mobile computing, and wireless networks, it is not possible to predict in advance every possible situation
– dynamic environments e.g., in the case of mobile code, self-adaptive systems and
peer-to-peer systems, resources suddenly appear and disappear
11/36
Existing Approaches
Field-data has been collected for:– Evaluating usability of an application (usability
testing)– Modelling usage of the system
which components, modules and functionalities are used?
– Learning properties of the implementation– Modelling program faults
which failures have been recognized on the target system?
12/36
Evaluating Usability
Traditionally, data for usability testing has been gathered by running testing sessions
Novel approaches: silent data-gathering systems– Automatic Navigability Testing System (ANTS) [Rod02]– Web Variable Instrumented Program (Webvip) [VG] – Gamma System [OLHL02]
13/36
Silent Data-Gathering Systems (1/2)
ANTS Webvip
http://...
ANTS server
server agent
communication user’s actions
Data server
http://...
user’s actions
session fileupload
client-side agent
multimedia content script
14/36
Silent Data-Gathering Systems (2/2)
Gamma
figure appeared in [OLHL02]
15/36
Modelling Usage of the System (1/2)
for performing system-specific impact analysis– Law and Rothermel’s impact analysis [LR03]
the program is instrumented to produce execution traces representing the procedure-level execution flow, e.g., MBrACDrErrrrx
the impacted set for procedure P is computed by selecting procedures that are called by P and procedures that are in the call stack when P returns
– Orso et al.’s impact analysis [OAH03] entity-level instrumentation: an execution trace is a sequence of
traversed entities a change c on entity e potentially affects all entities of traces
containing e the impact set is given from the intersection between the potentially
affected entities and the result of a forward slicing with variable used on change c as slicing criterion
16/36
Modelling Usage of the System (2/2)
Information from impact analysis can be used in regression testing– Orso et al’s regression testing [OAH03]
entity-level instrumentation test suite T’ is initialized with all test cases contained in existing test suite T traversing the change
T’ is augmented with test cases covering uncovered impacted entities computed with Orso et al’s impact analysis technique
test suite prioritization is performed by privileging test cases covering more impacted entities
for increasing confidence of the program– Pavlopoulou and Young’s perpetual testing [PY99]
normal executions are considered as tests instrumentation measures statement coverage of uncovered blocks, even
in the final environment the program can be iteratively generated to reduce instrumentation
17/36
Learning Properties (1/2)
Automatic synthesis of properties/invariants– Ernst et al’s approach [ECGN01]
initially, a large set of invariants is supposed to hold over monitored variables
each execution can falsify some invariants. Falsified invariants are deleted
for each of true invariants is computed the probability that it “randomly holds“
if this probability is below a given threshold the invariant is accepted synthesized properties are defined by the set of accepted invariants
Automatic synthesis of programs– Many approaches from machine learning, but they learn very simple
functions– Lau et al’s approach [LDW03]
it is still simple, but it learns small computer programs based on accurate execution traces and programming constructs
18/36
Learning Properties (2/2)
Synthesized properties, invariants and programs can be used to– check the implementation with respect to the
specification– verify safety of updates (in terms of components’
replacements) Ernst at al. approach has been used to verify Pre-cond,
Post-cond and Inv corresponding to implemented services when replacing components [ME03]
– derive test suites– provide to the programmer confidence over the
implementation
19/36
Test, Verification and Model-Checking (TVM)
Evolution of Testing, Model Checking, and Run-time Verification
Will mention their advantages and disadvantages
Mention future research agenda Conclusion
20/36
TVM
It started with “The Software Crisis” [NATO, 1968]
Led to calls for software “Engineering” [Bauer, 1968]
Focus on methodology for constructing software (e.g. Structured Programming [Dijkstra, 1969]; Chief Programmer Team [Harlan Mills @ IBM, 1973])
21/36
TVM
Higher level languages viewed as panacea (C, Java, ML, Meta-ML)
Buggy software was still being produced Focus shifted to detecting and preventing
mistakes during software construction --- Testing
22/36
TVM - Testing
2 main approaches to Testing: Reliability Growth Modelling (RGM) and Random Testing
In RGM, program is corrected, tested, fails, corrected, tested again, goes on many times
MTBF (Mean Time Between Failure) entered into a mathematical model derived from previous experiences
23/36
TVM - Testing
When the model indicates a very long MTBF, we stop testing, and ship product
Pitfalls of RGM: Very tenuous (weak) link between past
development processes and the current one Correction of a bug can introduce new bugs,
which reduces dependability, and
24/36
TVM - Testing
Industrial practice found you need extremely large amounts of failure-free testing
Thereby not cost-effective Random Testing: test cases are selected
randomly from a domain of possible inputs Advantages of Random Testing over RGM: Random, therefore non-automatable, you are
more likely to find errors, and
25/36
TVM - Testing
Random testing draws on tools from information theory to analyse results
Pitfalls of Random Testing: Distribution of random test cases may not be
the same as real usage of system Random testing takes no account of program
size, a 10-line program treated the same as a 10000-line program
26/36
TVM - Program Review
Buggy software was still being produced Another panacea tried was Program Review
(Software Inspection) Depends on humans making the right
decisions Fallible on human errors
27/36
TVM - Program Proving (Theorem Provers)
Solution then became Formal Deductive Reasoning – Program Proving
Automated Theorem Provers (e. g. Isabelle [Camb]) developed to prove programs
A main problem with theorem provers is the impracticality of proving all layers of the system from software programs to hardware to circuits
28/36
TVM - Model Checking
Alternative approach to theorem provers is model checking
In model checking, specification for a system is expressed in temporal logic, and the system is modelled as graph of finite state transitions, and a model checker checks whether the graph matches the temporal logic specification
29/36
TVM - Model Checking
Advantages over theorem provers: Algorithmic, so the user need only to press a
button and wait for the result while in theorem provers, a user may need to direct the theorem prover to find a solution
Gives counterexamples if formula is not satisfied
30/36
Model Checking
Disadvantage of model checking: Computational complexity, and Some information about the system is lost
when you turn a system with an infinite number of states to a finite number
There are calls for Run-Time Verification of software
31/36
TVM - Run-Time Verification (RTV)
Some ideas of this were presented above. Observations of some RTV tools: Simply debuggers with fancy features Or they provide good tracing mechanisms Encouraging observations of RTV tools: Some use LTL (or extensions) to describe
the program monitor
32/36
TVM - RTV
Some use LTL as the basis for a Property Specification Language, such as PEDL, MEDL
May be used as a basis for understanding and for theory
33/36
Call to Arms - Future Research Agenda
We need a Theory of Testing Such theory should integrate good aspects of
testing, model checking, and run-time verification
I shall mention some approaches (references in our paper)
34/36
Some Approaches to Theory of Testing
Type Systems/Abstract Interpretation Work from compiling and type systems directed
towards optimisation of code can provide good information to direct selection of test cases
Polymorphism and linearity can help Very little work so far on Semantics of Testing
(encouraging work from this workshop)
35/36
Some Approaches to Theory of Testing
Developing semantic structures (e.g. of domain) that facilitate testing may be something to look at
Semantics of A.I. Planning to provide a basis for semantics of run-time verification (ref. in our paper)
Domain theory in concurrency to provide semantics for distributed system testing (ref. in paper)
36/36
Conclusions
Call to arms for theory builders and tool builders
Come up with good theories and better tools Provide tools for software professionals to
use for system specification, design, build, test, audit, monitor systems
Let’s do it !!!