program analysis using randomization sumit gulwani, george necula (u.c. berkeley)

Program Analysis Using Randomization

Sumit Gulwani, George Necula

(U.C. Berkeley)

What kind of Analysis?Any analysis that can be modeled as checking

equivalence of two expressions at a point in a program

Equivalent to checking Reachability Properties Complexity of our algorithm

(Almost) Linear Time Queries answered in (almost) constant time

The Randomized Strategy

Define a mapping F: Expression → Polynomial such that P1 ≡ P2

) E1 ≡ E2 (Soundness)

E1 ≡ E2 [according to some theory T]

) P1 ≡ P2 (Completeness w.r.t. T)

For a loop-free program, F(E) = i Predi vi

x = 7

t = x + y

y = 9 y = 5

C1

C2

Example

x = 3

F(t) = C1 C2 (3+5)

+ C1 ¬C2 (3+9)

+ ¬C1 C2 (7+5)

+ ¬C1 ¬C2 (7+9)

T

T F

F

Checking Polynomial Equivalence

P1 ≡ P2 can be determined by random testing with small error probability (Probabilistic Soundness)

F: Expression ! Polynomial

can be thought of as

F: Expression ! [List of numbers]

Algorithm

Statements with side-effect x = e Mem[e1] = e2

x = e Record in the register table: Store(x) Ã F(e) Register table is simply an array

Mem[e1] = e2

Record in the memory table: F(e1) Ã F(e2)

Arithmetic and Uniterpreted Functions

F(n) ! [n, n, n]

F(x) ! Store(x) if x was defined before use

| Rand() otherwise

F(e1 + e2) ! F(e1) + F(e2)

F(e1 - e2) ! F(e1) - F(e2)

F(e1 * e2) ! F(e1) * F(e2)

F(e1 / e2) ! F(e1) / F(e2)

F(U(e1, …, en)) ! Rand(U, F(e1), …, F(en))

Joins

F ((c: e1, ¬c: e2)) = F(e1 ©r e2)

r = F(c)

e1 ©r e2 ≡ r × e1 + (1-r) × e2

Note that r + (1-r) = 1

Linear equalities are preserved Furthermore, if 0 ≤ r ≤ 1

Linear inequalities will be preserved

Preservation of Linear Invariants F(y) = r F(y1) + (1-r) F(y2)

F(x) = r F(x1) + (1-r) F(x2)

= r F(ay1 + b)

+ (1-r) F(ay2 + b)

= ar F(y1) + a(1-r) F(y2)

+ r b + (1-r) b

= a F(y) + b y = (c: y1, ¬c: y2)

x = (c: x1, ¬c: x2)

assert (x = ay + b)

y1 = …

x1 = a y1 + b

y2 = …

x2 = a y2 + b

Lock = L0

assert (Lock = L0)

Lock - -

C

C

Locking Example (Joins)

Lock + +

L1 = L0

L5 = Ф(c: L4, ¬c: L3)

assert (L5 = L0)

L4 = L3 -1

C

C

Locking Example (Joins)

L2 = L1 + 1

L3 = Ф(c: L2, ¬c: L1)

F(L1) = F(L0)

F(L2) = F(L1) + 1

F(L3) = r F(L2) + (1-r) F(L1)

= F(L0) + r

F(L4) = F(L3) – 1

F(L5) = r F(L4) + (1-r) F(L3)

= F(L3) – r

= F(L0) + r – r

= F(L0)

t = 2x - 3y

assert (t = 5)

Content of Conditionals

x - y == 5 ?

x + y == 15 ?

F

T F

T

P1

P0

P3

P2

F(x) = [1, 2, 3]

F(y) = [1, 4, 9]

P0:

t = 2x - 3y

assert (t = 5)


x - y == 5 ?

x + y == 15 ?

F

T F

T

P1

P0

P3

P2

F(x) = [1, 2, 3]

F(y) = [1, 4, 9]

F(t) = F(2x – 3y)

= [-1, -8, -21]

P1:

Content of ConditionalsC

… = y + … … = y + ...

T F


Split F(y) into F(yT) and F(yF) such that

F(yT) = A(F(y))

F(yT) ©r F(yF) = F(y), where r = F(c)

A([v1, v2, v3]) → [v1 ©r1 v2, v2 ©r2 v3, v3 ©r3 v1)]

C

… = yT + … … = yF + ...

T F

Example

t = 2x - 3y

assert (t = 5)


x - y == 5 ?

x + y == 15 ?

F

T F

T

P1

P0

P3

P2

F(x) = [1, 2, 3]

F(y) = [1, 4, 9]

F(t) = [-1, -8, -21]

P1:

t = 2x - 3y

assert (tTT = 5)

Example (Content of Conditionals)

x - y == 5 ?

xT + yT == 15 ?

F

T F

T

P1

P0

P3

P2

F(x) = [1, 2, 3]

F(y) = [1, 4, 9]

F(t) = [-1, -8, -21]

F(x - y - 5) = [-5, -7, -11]

P1:

t = 2x - 3y

assert (tTT = 5)


x - y == 5 ?

xT + yT == 15 ?

F

T F

T

P1

P0

P3

P2

F(xT) = [-3/2, 1/4, -2/3]

F(yT) = [-13/2, -19/4, -17/3]

F(tT) = [33/2, 59/4, 47/3]

Note that

xT – yT = 5

tT + yT = 10

Because, t = 2x - 3y

= 2(x-y) - y

= 10 - y

P2:

t = 2x - 3y

assert (tTT = 5)


x - y == 5 ?

xT + yT == 15 ?

F

T F

T

P1

P0

P3

P2

F(xT) = [-3/2, 1/4, -2/3]

F(yT) = [-13/2, -19/4, -17/3]

F(tT) = [33/2, 59/4, 47/3]

F(xT + yT – 15) = [-23, -39/2, -64/3]

P2:

t = 2x - 3y

assert (tTT = 5)


x - y == 5 ?

xT + yT == 15 ?

F

T F

T

P1

P0

P3

P2

F(xTT) = [10, 10, 10]

F(yTT) = [5, 5, 5]

F(tTT) = [5, 5, 5]

P3:

ConditionalsF (c) ! 1 (if our algorithm can prove that c is always true)

! 0 (if our algorithm can prove that c is always false)

! Rand (c) (equivalent conditionals get the same

random value) Let c be of the form: e == 0, Let F(e) = [v1, v2, v3]

e ≡ 0 ) c is always true Check: F(e) = F(0)

e ≡ n, n ≠ 0 ) c is always false

Check: v1 = v2 = v3 ≠ 0

e ≡ n1 E + n2, 0 < n2 < n1 ) c is always false For e.g. 2x + 1 ≠ 0 n1 = GCD { v1 – v2, v2 – v3 }

Check: n2 = v1 % n1 > 0

Detecting Equivalent Conditionals To Check: (e1 == 0) ≡ (e2 == 0)

e1 ≡ e (e2), e ≠ 0 ) (e1 == 0) ≡ (e2 == 0) For e.g. (x + 1 == 0) ≡ (2x + 2 == 0)

e ≠ 0 can be checked if we know F(e) F(e) = F(e1) / F(e2)

Loops

F((x))= F(x0 ©r1 xi+1)

xi+1 = g(xi)

r1 = Rand(c(xi))

xi = x0 ©r2 g(x0)

r2 = Rand() Linear Loop Invariants are preserved

Automatic Discovery of Invariants Automatic Use of Invariants

x = x0;

while c(x) { x = g(x); }

t = (x);

x = x0;

while c(x) { x = g(x); }

t = x;

Example (Loops)

x = 0; y = 1;

x = x + 1;

y = y + 2;

C(x) ?

assert (y = 2x + 1)

Example (Loops)

x = 0; y = 1;

x = x + 1;

y = y + 2;

C(x) ?

x’ = (x); y’ = (y);

assert (y’ = 2x’ + 1)

F(x’) = F((x))

= F(0 ©r1 ((0 ©r2 (1)) + 1))

= r1 0 + (1-r1) ((r2 0 + (1-r2)1) + 1)

= (1-r1) (1- r2 + 1)

= 2 - 2r1 - r2 + r1r2

F(y’) = F((y))

= 1 ©r1 ((1 ©r2 (3)) + 2)

= r11 + (1-r1) ((r2 1 + (1-r2) 3) + 2)

= r1 + (1-r1) (5- 2r2)

= 5 - 4r1 -2r2 + 2r1r2

MemoryM[x] = v6

M[y]=v

M[y+1]=v5

M[2z] = v4

M[2z+1] = v3

M[4z+3] = v2

M[2z+1] = v1

T1 = M[y]

M[y] = v

M[4z+3] = v2

M[2z+1] = v1

M[2z] = v4

T2 = M[y]

assert (T1 = T2)

Memory

F (Mem[a]) = F(v1 ©r1 v2 ©r2….vn ©rn v)

= F (r1v1 + r2v2 + … + rnvn + (1-r1-r2-…-rn) v)

ri = F(Conditions under which vi is read)

Example (Memory)M[x] = v6

M[y]=v

M[y+1]=v5

M[2z] = v4

M[2z+1] = v3

M[4z+3] = v2

M[2z+1] = v1

T1 = M[y]

T1 = M[y]

= v1 if (y == 2z+1)

v2 if (y != 2z+1 Æ y == 4z+3)

v4 if (y == 2z)

v otherwise

F(T1) = F(M[y])

= F(r1 v1 + r2 v2 + r4 v4 + (1 – r1 – r2 – r4) v)

where, r1 = F(y == 2z+1)

r2 = F(y != 2z+1 Æ y = 4z+3)

r4 = F(y == 2z)

Example (Memory)

M[y]=v

M[4z+3] = v2

M[2z+1] = v1

M[2z] = v4

T2 = M[y]

T2 = M[y]

= v4 if (y == 2z)

+ v1 if (y == 2z+1)

+ v2 if (y != 2z+1 Æ y == 4z+3)

+ v otherwiseF(T2) = F(M[y])

= F(r4 v4 + r1 v1 + r2 v2 + (1 – r4 – r1 – r2) v)

where, r4 = F(y == 2z)

r1 = F(y == 2z+1)

r2 = F(y != 2z+1 Æ y = 4z+3)

Example (Memory)M[x] = v6

M[y]=v

M[y+1]=v5

M[2z] = v4

M[2z+1] = v3

M[4z+3] = v2

M[2z+1] = v1

T1 = M[y]

M[y]=v

M[4z+3] = v2

M[2z+1] = v1

M[2z] = v4

T2 = M[y]

F(T1) = F(r1 v1 + r2 v2 + r4 v4 + (1 – r1 – r2 – r4) v)

F(T2) = F(r4 v4 + r1 v1 + r2 v2 + (1 – r4 – r1 – r2) v)

Applications Program Verification

Automatic discovery of useful loop invariants Translation Validation Compiler Optimizations

Eliminating redundant computations, branches, memory reads.

Partial Evaluation Interactive Debugging and Testing of Programs

Related Light-weight Techniques Value Numbering

Targets Structural Equivalence of expressions Detects only equalities

Random Testing Cannot ‘prove’ equivalence of expressions

can only provide a counter-example Exponential number of paths Even generating input data to execute a particular

path is difficult

Conclusion Comparison with Symbolic Analysis

very simple data structure: list of numbers with simple operations and fast judgements

There is a limit to what a linear time analysis can achieve! Excellent base to build up more complicated analysis

Join lazily

“The intriguing possibility that axioms of randomness may constitute a useful fundamental source of truth independent of, but supplementary to, the standard axiomatic structure of mathematics suggests that probabilistic algorithms ought to be sought vigorously.”

- J.T. Schwartz

program analysis using randomization sumit gulwani, george necula (u.c. berkeley)

Documents