learning symbolic interfaces of software components zvonimir rakamarić

Learning Symbolic Interfaces of Software Components

Zvonimir Rakamarić

This Work

Published at Static Analysis Symposium 2012 Joint work with Dimitra Giannakopoulou

(NASA) and Vishwanath Raman (CMU/NASA)

Introduction

Motivating Exampleclass Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

• init can be called unconditionally

• a can be called unconditionally

• b can be called after init only when y != 10

Learn temporal interfaces of software components Legal and illegal sequences of method calls

defined as an automaton Why?

Documentation Reverse engineering Model-based testing Regression testing Compositional verification …

Limitations of Prior Approaches

Since method b in Example cannot be called unconditionally after init, prior approaches either consider calling b after init an error no matter what

the values of the parameters it depends on are, or expect init to be manually partitioned

Our Contribution

class Example { ...}

Background

Symbolic Execution

Key idea: execution of programs using symbolic input values instead of concrete data

Concrete vs symbolic Concrete execution

Program takes only one path determined by input values

Symbolic execution Program can take any feasible path – coverage! Limited by the power of constraint solver Scalability issues when faced with large (exponential)

number of paths – path explosion

Symbolic Program State

Symbolic values of program variables Path condition (PC)

Logical formula over symbolic inputs Accumulates constraints that inputs have to satisfy

for the particular path to be executed If a path is feasible its PC is satisfiable

Program location

Symbolic Execution Tree

Characterizes execution paths constructed during symbolic execution

Nodes are symbolic program states Edges are labeled with program transitions

Example

1) int x, y;2) if (x > y) {3) x = x + y;4) y = x – y;5) x = x – y;6) if (x > y)7) assert false;8) }

x:X, y:YPC:truex:X, y:YPC:true

x:X, y:YPC:X>Yx:X, y:YPC:X>Y

x:X, y:YPC:X<=Yx:X, y:YPC:X<=Y

x:X+Y, y:YPC:X>Y

x:X+Y, y:XPC:X>Y

x:Y, y:XPC:X>Yx:Y, y:XPC:X>Y

x:Y, y:XPC:X>Y Æ

true false

SATUNSAT

1) int x, y;

2) if (x > y) {

3) x = x + y;

4) y = x – y;

5) x = x – y;

6) if (x > y)

7) assert false;

Active Automata Learning

D. Angluin, 1987: “Learning Regular Sets from Queries and Counterexamples”

Algorithm is called L* L* learns unknown regular language U (over

alphabet ) and produces minimal DFA A such that L(A) = U

Complexity of the original algorithm is O(||*|A|3)

Active Automata Learning cont.

L* learner communicates with a teacher using two types of queries

Membership queries: Should word w be included in L(A)? Expected answer: yes/no

Equivalence queries: Here is a conjectured DFA A – is L(A) = U? Expected answer: yes/no+counterexample

L* Learner Teacher

word w

yes/no

yes/no+cex

PSYCO Algorithm

Interface Learning with L*

L* uses a teacher to answer the following queries Membership queries

Whether or not a given sequence of method calls leads to an error or not in the implementation

Equivalence queries Whether a conjectured DFA captures all the behaviors

of the implementation

Answering Membership Queries

Running Exampleclass Example { private static int x = 0; private static int y = 0;

Executing query <init;b>

class Example { private static int x = 0; private static int y = 0;

x:P, y:QPC: truex:P, y:QPC: true

OKPC: Q != 10

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: Q == 10

OKPC: Q != 10

ERRORPC: Q == 10

OKPC: Q != 10

ERRORPC: Q == 10

Refinement: Split init

public static void init(int p, int q) { x = p; y = q;}

public static void init_0(int p, int q) { assume q != 10; init(p, q);}public static void init_1(int p, int q) { assume q == 10; init(p, q);}

OKPC: Q != 10

ERRORPC: Q == 10

init_0 := init[q != 10]

init_1 := init[q == 10]

Restart Learning

public static void init(int p, int q) { x = p; y = q;}

public static void init_0(int p, int q) { assume q != 10; init(p, q);}public static void init_1(int p, int q) { assume q == 10; init(p, q);}

new learner alphabet:{init_0, init_1, a, b}

learning restarts, re-using results from previous iterations

OKPC: Q != 10

ERRORPC: Q == 10

Executing query <init_0;a;b>

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

ERRORPC: P = 0ERROR

PC: P = 0

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

ERRORPC: P = 0ERROR

PC: P = 0

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

ERRORPC: P = 0ERROR

PC: P = 0

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

ERRORPC: P = 0ERROR

PC: P = 0

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

ERRORPC: P = 0ERROR

PC: P = 0

Refinement: Split init_0

ERRORPC: P = 0ERROR

PC: P = 0

public static void init_0(int p, int q) { assume q != 10; x = p; y = q;}public static void init_0_0(int p, int q) { assume p == 0 && q != 10; init(p, q);}public static void init_0_1(int p, int q) { assume p != 0 && q != 10; init(p, q);}

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

init_0_0 := init[q != 10 && p == 0]

init_0_1 := init[q != 10 && p != 0]

Restart Learning

ERRORPC: P = 0ERROR

PC: P = 0

public static void init_0(int p, int q) { assume q != 10; x = p; y = q;}public static void init_0_0(int p, int q) { assume p == 0 && q != 10; init(p, q);}public static void init_0_1(int p, int q) { assume p != 0 && q != 10; init(p, q);}

new learner alphabet:{init_0_0, init_0_1, init_1, a, b}

learning restarts

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

Answering Equivalence Queries

Unbounded Loops in Conjectures

Component have no loops, but conjectures do!

We unroll unbounded loops in conjectures a bounded number of times

Answering Equivalence Queries

Walk the conjectured automaton and extract all legal method sequences to a given depth k all illegal method sequences

for each illegal sequence of depth n, extract the legal sequence of depth n - 1

We then use membership queries to check the outcome of each sequence If a sequence is misclassified by the learner, we

have a counterexample for L*

Running Example: Depth is 2

public static void init(int p, int q) { x = p; y = q; }

public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

Running Example: Depth is 3

Implementation and Experiments

Architecture of PSYCO

Implementation of PSYCO

Implemented on top of Java PathFinder (JPF) software model checking infrastructurehttp://babelfish.arc.nasa.gov/trac/jpf

PSYCO-related modules jpf-psyco: interface generation for Java classes

including parameters uses jpf-learn and jpf-jdart

jpf-learn: implements L* jpf-jdart: symbolic execution in JPF

actually DART/concolic

Experiments

Example Methods

Conjectures

Refinements

Alphabet States

SIGNATURE 5 7 2 2 0 5 4

PIPEDOUTPUTSTREAM

4 7 2 2 1 5 3

INTMATH 8 1 1 1 7 16 3

ALTBIT 2 27 4 8 3 5 5

CEV-FLIGHTRULE 3 3 3 3 2 5 3

CEV 18 3 3 10 6 24 9

k-max is the maximum exploration depth reached in one hourk-min is the depth when we realized the expected interface

Automata do not change between k-min and k-max, and are k-max-full

Summary

Combined automata learning and symbolic techniques for temporal interface generation Generating richer interfaces with symbolic method

guards Implemented a prototype tool in Java PathFinder

Works well on realistic examples Equivalence queries are a potential bottleneck

Our Contribution cont.

We learn 3-valued Deterministic Finite Automata

mod(p, q)[q > 0 && p >= 0]

mod(p, q)[q <= 0 || p < 0]

div(p, q)[q == 0]

div(p, q)[q != 0]

DON’T KNOW

INITIAL

Using 3-Valued DFA

mod(p, q)[q > 0 && p >= 0]

mod(p, q)[q <= 0 || p < 0]

div(p, q)[q == 0] div(p, q)

[q != 0]

INITIAL

Underlying solver returns “Don’t Know”

Using 3-Valued DFA cont.

We learn 3-valued Deterministic Finite Automata

mod(p, q)[q > 0 && p >= 0]

mod(p, q)[q <= 0 || p < 0]

div(p, q)[q == 0]

div(p, q)[q != 0] DON’T KNOW

INITIAL

Definition of k-full Interface

Interface is k-safe if all legal sequences in the automata to depth k are also legal executions in the component

Interface is k-permissive if all illegal sequences in the automata to depth k also lead to errors in the component

Interface is k-tight if all sequences to depth k leading to the don’t know state in the automata cannot be resolved in the component

Interface that is k-safe, k-permissive, and k-tight is k-full

Guarantees of PSYCO Algorithm

Theorem: If the behavior of a component C can be characterized by an interface DFA, then PSYCO terminates with a k-full interface for C. Proof is in the SAS paper No unbounded loops/recursion in components No “mixed parameters”

learning symbolic interfaces of software components zvonimir rakamarić

Documents

krivicno pravo 1 - zvonimir tomic, sarajevo 2008.pdf

zvonimir blasko skladistenje podataka

zvonimir golob

fem analiza-zvonimir nenadic-zavrsni rad

208026212 utjecaji na odležavanje vina zvonimir miškić

zvonimir milčec - zagreb je inače lijep novo

zvonimir lešić - teorija književnosti

zvonimir miličević;martin seminarski rad - spvp berić

zvonimir lušić terestrička navigacija skripta

zvonimir krstulovic - bijelo dugme

zvonimir balog - male priče o velikim slovima

utjecaji na odleŽavanje vina zvonimir miškić

dmitar zvonimir

bijelo dugme zvonimir krstulovic

dmitar zvonimir magi

andrea labetić i zvonimir kunosić

zvonimir diklić, aleksandar flaker, urednik drugog izdanja

zvonimir trajkovic

3 zvonimir križ agile adria comes to you

zvonimir lončarić