learning symbolic interfaces of software components zvonimir rakamarić

Post on 14-Jan-2016

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Learning Symbolic Interfaces of Software Components

Zvonimir Rakamarić

This Work

Published at Static Analysis Symposium 2012 Joint work with Dimitra Giannakopoulou

(NASA) and Vishwanath Raman (CMU/NASA)

Introduction

Motivating Exampleclass Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

• init can be called unconditionally

• a can be called unconditionally

• b can be called after init only when y != 10

Goal

Learn temporal interfaces of software components Legal and illegal sequences of method calls

defined as an automaton Why?

Documentation Reverse engineering Model-based testing Regression testing Compositional verification …

Limitations of Prior Approaches

Since method b in Example cannot be called unconditionally after init, prior approaches either consider calling b after init an error no matter what

the values of the parameters it depends on are, or expect init to be manually partitioned

Our Contribution

class Example { ...}

Background

Symbolic Execution

Key idea: execution of programs using symbolic input values instead of concrete data

Concrete vs symbolic Concrete execution

Program takes only one path determined by input values

Symbolic execution Program can take any feasible path – coverage! Limited by the power of constraint solver Scalability issues when faced with large (exponential)

number of paths – path explosion

Symbolic Program State

Symbolic values of program variables Path condition (PC)

Logical formula over symbolic inputs Accumulates constraints that inputs have to satisfy

for the particular path to be executed If a path is feasible its PC is satisfiable

Program location

Symbolic Execution Tree

Characterizes execution paths constructed during symbolic execution

Nodes are symbolic program states Edges are labeled with program transitions

Example

1) int x, y;2) if (x > y) {3) x = x + y;4) y = x – y;5) x = x – y;6) if (x > y)7) assert false;8) }

x:X, y:YPC:truex:X, y:YPC:true

x:X, y:YPC:X>Yx:X, y:YPC:X>Y

x:X, y:YPC:X<=Yx:X, y:YPC:X<=Y

x:X+Y, y:YPC:X>Y

x:X+Y, y:YPC:X>Y

x:X+Y, y:XPC:X>Y

x:X+Y, y:XPC:X>Y

x:Y, y:XPC:X>Yx:Y, y:XPC:X>Y

x:Y, y:XPC:X>Y Æ

Y>X

x:Y, y:XPC:X>Y Æ

Y>X

x:Y, y:XPC:X>Y Æ

Y<=X

x:Y, y:XPC:X>Y Æ

Y<=X

true

true false

false

SAT

SATUNSAT

SAT

1) int x, y;

2) if (x > y) {

3) x = x + y;

4) y = x – y;

5) x = x – y;

6) if (x > y)

7) assert false;

8) }

Active Automata Learning

D. Angluin, 1987: “Learning Regular Sets from Queries and Counterexamples”

Algorithm is called L* L* learns unknown regular language U (over

alphabet ) and produces minimal DFA A such that L(A) = U

Complexity of the original algorithm is O(||*|A|3)

Active Automata Learning cont.

L* learner communicates with a teacher using two types of queries

Membership queries: Should word w be included in L(A)? Expected answer: yes/no

Equivalence queries: Here is a conjectured DFA A – is L(A) = U? Expected answer: yes/no+counterexample

L* Learner Teacher

word w

yes/no

DFA A

yes/no+cex

DFA A

PSYCO Algorithm

Interface Learning with L*

L* uses a teacher to answer the following queries Membership queries

Whether or not a given sequence of method calls leads to an error or not in the implementation

Equivalence queries Whether a conjectured DFA captures all the behaviors

of the implementation

Answering Membership Queries

L* uses a teacher to answer the following queries Membership queries

Whether or not a given sequence of method calls leads to an error or not in the implementation

Equivalence queries Whether a conjectured DFA captures all the behaviors

of the implementation

Running Exampleclass Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

Executing query <init;b>

class Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

OKPC: Q != 10

OKPC: Q != 10

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: Q == 10

ERRORPC: Q == 10

Executing query <init;b>

class Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

OKPC: Q != 10

OKPC: Q != 10

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: Q == 10

ERRORPC: Q == 10

Executing query <init;b>

class Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

p:P, q:QPC: truep:P, q:QPC: true

OKPC: Q != 10

OKPC: Q != 10

ERRORPC: Q == 10

ERRORPC: Q == 10

Refinement: Split init

public static void init(int p, int q) { x = p; y = q;}

public static void init_0(int p, int q) { assume q != 10; init(p, q);}public static void init_1(int p, int q) { assume q == 10; init(p, q);}

x:P, y:QPC: truex:P, y:QPC: true

p:P, q:QPC: truep:P, q:QPC: true

OKPC: Q != 10

OKPC: Q != 10

ERRORPC: Q == 10

ERRORPC: Q == 10

init_0 := init[q != 10]

init_1 := init[q == 10]

Restart Learning

public static void init(int p, int q) { x = p; y = q;}

public static void init_0(int p, int q) { assume q != 10; init(p, q);}public static void init_1(int p, int q) { assume q == 10; init(p, q);}

new learner alphabet:{init_0, init_1, a, b}

learning restarts, re-using results from previous iterations

x:P, y:QPC: truex:P, y:QPC: true

p:P, q:QPC: truep:P, q:QPC: true

OKPC: Q != 10

OKPC: Q != 10

ERRORPC: Q == 10

ERRORPC: Q == 10

Executing query <init_0;a;b>

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

Executing query <init_0;a;b>

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

Executing query <init_0;a;b>

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

Executing query <init_0;a;b>

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

Executing query <init_0;a;b>

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

Refinement: Split init_0

ERRORPC: P = 0ERROR

PC: P = 0

public static void init_0(int p, int q) { assume q != 10; x = p; y = q;}public static void init_0_0(int p, int q) { assume p == 0 && q != 10; init(p, q);}public static void init_0_1(int p, int q) { assume p != 0 && q != 10; init(p, q);}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

init_0_0 := init[q != 10 && p == 0]

init_0_1 := init[q != 10 && p != 0]

Restart Learning

ERRORPC: P = 0ERROR

PC: P = 0

public static void init_0(int p, int q) { assume q != 10; x = p; y = q;}public static void init_0_0(int p, int q) { assume p == 0 && q != 10; init(p, q);}public static void init_0_1(int p, int q) { assume p != 0 && q != 10; init(p, q);}

new learner alphabet:{init_0_0, init_0_1, init_1, a, b}

learning restarts

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

Answering Equivalence Queries

L* uses a teacher to answer the following queries Membership queries

Whether or not a given sequence of method calls leads to an error or not in the implementation

Equivalence queries Whether a conjectured DFA captures all the behaviors

of the implementation

Unbounded Loops in Conjectures

Component have no loops, but conjectures do!

We unroll unbounded loops in conjectures a bounded number of times

Answering Equivalence Queries

Walk the conjectured automaton and extract all legal method sequences to a given depth k all illegal method sequences

for each illegal sequence of depth n, extract the legal sequence of depth n - 1

We then use membership queries to check the outcome of each sequence If a sequence is misclassified by the learner, we

have a counterexample for L*

Running Example: Depth is 2

class Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; }

public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

Running Example: Depth is 3

Implementation and Experiments

Architecture of PSYCO

Implementation of PSYCO

Implemented on top of Java PathFinder (JPF) software model checking infrastructurehttp://babelfish.arc.nasa.gov/trac/jpf

PSYCO-related modules jpf-psyco: interface generation for Java classes

including parameters uses jpf-learn and jpf-jdart

jpf-learn: implements L* jpf-jdart: symbolic execution in JPF

actually DART/concolic

Experiments

Example Methods

k-max

k-min

Conjectures

Refinements

Alphabet States

SIGNATURE 5 7 2 2 0 5 4

PIPEDOUTPUTSTREAM

4 7 2 2 1 5 3

INTMATH 8 1 1 1 7 16 3

ALTBIT 2 27 4 8 3 5 5

CEV-FLIGHTRULE 3 3 3 3 2 5 3

CEV 18 3 3 10 6 24 9

k-max is the maximum exploration depth reached in one hourk-min is the depth when we realized the expected interface

Automata do not change between k-min and k-max, and are k-max-full

Summary

Summary

Combined automata learning and symbolic techniques for temporal interface generation Generating richer interfaces with symbolic method

guards Implemented a prototype tool in Java PathFinder

Works well on realistic examples Equivalence queries are a potential bottleneck

Our Contribution cont.

We learn 3-valued Deterministic Finite Automata

mod(p, q)[q > 0 && p >= 0]

mod(p, q)[q <= 0 || p < 0]

div(p, q)[q == 0]

div(p, q)[q != 0]

ERROR

DON’T KNOW

INITIAL

Using 3-Valued DFA

mod(p, q)[q > 0 && p >= 0]

mod(p, q)[q <= 0 || p < 0]

div(p, q)[q == 0] div(p, q)

[q != 0]

ERROR

INITIAL

Underlying solver returns “Don’t Know”

Using 3-Valued DFA cont.

We learn 3-valued Deterministic Finite Automata

mod(p, q)[q > 0 && p >= 0]

mod(p, q)[q <= 0 || p < 0]

div(p, q)[q == 0]

div(p, q)[q != 0] DON’T KNOW

INITIAL

ERROR

Definition of k-full Interface

Interface is k-safe if all legal sequences in the automata to depth k are also legal executions in the component

Interface is k-permissive if all illegal sequences in the automata to depth k also lead to errors in the component

Interface is k-tight if all sequences to depth k leading to the don’t know state in the automata cannot be resolved in the component

Interface that is k-safe, k-permissive, and k-tight is k-full

Guarantees of PSYCO Algorithm

Theorem: If the behavior of a component C can be characterized by an interface DFA, then PSYCO terminates with a k-full interface for C. Proof is in the SAS paper No unbounded loops/recursion in components No “mixed parameters”

top related