program analysis using randomization sumit gulwani, george necula (u.c. berkeley)
Post on 20-Dec-2015
225 views
TRANSCRIPT
Program Analysis Using Randomization
Sumit Gulwani, George Necula
(U.C. Berkeley)
What kind of Analysis?Any analysis that can be modeled as checking
equivalence of two expressions at a point in a program
Equivalent to checking Reachability Properties Complexity of our algorithm
(Almost) Linear Time Queries answered in (almost) constant time
The Randomized Strategy
Define a mapping F: Expression → Polynomial such that P1 ≡ P2
) E1 ≡ E2 (Soundness)
E1 ≡ E2 [according to some theory T]
) P1 ≡ P2 (Completeness w.r.t. T)
For a loop-free program, F(E) = i Predi vi
x = 7
t = x + y
y = 9 y = 5
C1
C2
Example
x = 3
F(t) = C1 C2 (3+5)
+ C1 ¬C2 (3+9)
+ ¬C1 C2 (7+5)
+ ¬C1 ¬C2 (7+9)
T
T F
F
Checking Polynomial Equivalence
P1 ≡ P2 can be determined by random testing with small error probability (Probabilistic Soundness)
F: Expression ! Polynomial
can be thought of as
F: Expression ! [List of numbers]
Algorithm
Statements with side-effect x = e Mem[e1] = e2
x = e Record in the register table: Store(x) Ã F(e) Register table is simply an array
Mem[e1] = e2
Record in the memory table: F(e1) Ã F(e2)
Expressions E : = n (Constant)
| x (Variable Reference)
| Mem[e] (Memory Read)
| e1 + e2 (Arithmetic)
| e1 - e2
| e1 * e2
| (e == 0) (Conditionals)
| (e ≥ 0)
| (c: e1, ¬c: e2) (Joins)
| (e1,e2) (Joins at Loop Entry)
| (x) (Loop Exit)
| U(e1, …, en) (Uninterpreted Functions)
Arithmetic and Uniterpreted Functions
F(n) ! [n, n, n]
F(x) ! Store(x) if x was defined before use
| Rand() otherwise
F(e1 + e2) ! F(e1) + F(e2)
F(e1 - e2) ! F(e1) - F(e2)
F(e1 * e2) ! F(e1) * F(e2)
F(e1 / e2) ! F(e1) / F(e2)
F(U(e1, …, en)) ! Rand(U, F(e1), …, F(en))
Joins
F ((c: e1, ¬c: e2)) = F(e1 ©r e2)
r = F(c)
e1 ©r e2 ≡ r × e1 + (1-r) × e2
Note that r + (1-r) = 1
Linear equalities are preserved Furthermore, if 0 ≤ r ≤ 1
Linear inequalities will be preserved
Preservation of Linear Invariants F(y) = r F(y1) + (1-r) F(y2)
F(x) = r F(x1) + (1-r) F(x2)
= r F(ay1 + b)
+ (1-r) F(ay2 + b)
= ar F(y1) + a(1-r) F(y2)
+ r b + (1-r) b
= a F(y) + b y = (c: y1, ¬c: y2)
x = (c: x1, ¬c: x2)
assert (x = ay + b)
y1 = …
x1 = a y1 + b
y2 = …
x2 = a y2 + b
Lock = L0
assert (Lock = L0)
Lock - -
C
C
Locking Example (Joins)
Lock + +
L1 = L0
L5 = Ф(c: L4, ¬c: L3)
assert (L5 = L0)
L4 = L3 -1
C
C
Locking Example (Joins)
L2 = L1 + 1
L3 = Ф(c: L2, ¬c: L1)
F(L1) = F(L0)
F(L2) = F(L1) + 1
F(L3) = r F(L2) + (1-r) F(L1)
= F(L0) + r
F(L4) = F(L3) – 1
F(L5) = r F(L4) + (1-r) F(L3)
= F(L3) – r
= F(L0) + r – r
= F(L0)
t = 2x - 3y
assert (t = 5)
Content of Conditionals
x - y == 5 ?
x + y == 15 ?
F
T F
T
P1
P0
P3
P2
F(x) = [1, 2, 3]
F(y) = [1, 4, 9]
P0:
t = 2x - 3y
assert (t = 5)
Content of Conditionals
x - y == 5 ?
x + y == 15 ?
F
T F
T
P1
P0
P3
P2
F(x) = [1, 2, 3]
F(y) = [1, 4, 9]
F(t) = F(2x – 3y)
= [-1, -8, -21]
P1:
Content of ConditionalsC
… = y + … … = y + ...
T F
Content of Conditionals
Split F(y) into F(yT) and F(yF) such that
F(yT) = A(F(y))
F(yT) ©r F(yF) = F(y), where r = F(c)
A([v1, v2, v3]) → [v1 ©r1 v2, v2 ©r2 v3, v3 ©r3 v1)]
C
… = yT + … … = yF + ...
T F
Example
t = 2x - 3y
assert (t = 5)
Content of Conditionals
x - y == 5 ?
x + y == 15 ?
F
T F
T
P1
P0
P3
P2
F(x) = [1, 2, 3]
F(y) = [1, 4, 9]
F(t) = [-1, -8, -21]
P1:
t = 2x - 3y
assert (tTT = 5)
Example (Content of Conditionals)
x - y == 5 ?
xT + yT == 15 ?
F
T F
T
P1
P0
P3
P2
F(x) = [1, 2, 3]
F(y) = [1, 4, 9]
F(t) = [-1, -8, -21]
F(x - y - 5) = [-5, -7, -11]
P1:
t = 2x - 3y
assert (tTT = 5)
Example (Content of Conditionals)
x - y == 5 ?
xT + yT == 15 ?
F
T F
T
P1
P0
P3
P2
F(xT) = [-3/2, 1/4, -2/3]
F(yT) = [-13/2, -19/4, -17/3]
F(tT) = [33/2, 59/4, 47/3]
Note that
xT – yT = 5
tT + yT = 10
Because, t = 2x - 3y
= 2(x-y) - y
= 10 - y
P2:
t = 2x - 3y
assert (tTT = 5)
Example (Content of Conditionals)
x - y == 5 ?
xT + yT == 15 ?
F
T F
T
P1
P0
P3
P2
F(xT) = [-3/2, 1/4, -2/3]
F(yT) = [-13/2, -19/4, -17/3]
F(tT) = [33/2, 59/4, 47/3]
F(xT + yT – 15) = [-23, -39/2, -64/3]
P2:
t = 2x - 3y
assert (tTT = 5)
Example (Content of Conditionals)
x - y == 5 ?
xT + yT == 15 ?
F
T F
T
P1
P0
P3
P2
F(xTT) = [10, 10, 10]
F(yTT) = [5, 5, 5]
F(tTT) = [5, 5, 5]
P3:
ConditionalsF (c) ! 1 (if our algorithm can prove that c is always true)
! 0 (if our algorithm can prove that c is always false)
! Rand (c) (equivalent conditionals get the same
random value) Let c be of the form: e == 0, Let F(e) = [v1, v2, v3]
e ≡ 0 ) c is always true Check: F(e) = F(0)
e ≡ n, n ≠ 0 ) c is always false
Check: v1 = v2 = v3 ≠ 0
e ≡ n1 E + n2, 0 < n2 < n1 ) c is always false For e.g. 2x + 1 ≠ 0 n1 = GCD { v1 – v2, v2 – v3 }
Check: n2 = v1 % n1 > 0
Detecting Equivalent Conditionals To Check: (e1 == 0) ≡ (e2 == 0)
e1 ≡ e (e2), e ≠ 0 ) (e1 == 0) ≡ (e2 == 0) For e.g. (x + 1 == 0) ≡ (2x + 2 == 0)
e ≠ 0 can be checked if we know F(e) F(e) = F(e1) / F(e2)
Loops
F((x))= F(x0 ©r1 xi+1)
xi+1 = g(xi)
r1 = Rand(c(xi))
xi = x0 ©r2 g(x0)
r2 = Rand() Linear Loop Invariants are preserved
Automatic Discovery of Invariants Automatic Use of Invariants
x = x0;
while c(x) { x = g(x); }
t = (x);
x = x0;
while c(x) { x = g(x); }
t = x;
Example (Loops)
x = 0; y = 1;
x = x + 1;
y = y + 2;
C(x) ?
assert (y = 2x + 1)
Example (Loops)
x = 0; y = 1;
x = x + 1;
y = y + 2;
C(x) ?
x’ = (x); y’ = (y);
assert (y’ = 2x’ + 1)
F(x’) = F((x))
= F(0 ©r1 ((0 ©r2 (1)) + 1))
= r1 0 + (1-r1) ((r2 0 + (1-r2)1) + 1)
= (1-r1) (1- r2 + 1)
= 2 - 2r1 - r2 + r1r2
F(y’) = F((y))
= 1 ©r1 ((1 ©r2 (3)) + 2)
= r11 + (1-r1) ((r2 1 + (1-r2) 3) + 2)
= r1 + (1-r1) (5- 2r2)
= 5 - 4r1 -2r2 + 2r1r2
MemoryM[x] = v6
M[y]=v
M[y+1]=v5
M[2z] = v4
M[2z+1] = v3
M[4z+3] = v2
M[2z+1] = v1
T1 = M[y]
M[y] = v
M[4z+3] = v2
M[2z+1] = v1
M[2z] = v4
T2 = M[y]
assert (T1 = T2)
Memory
F (Mem[a]) = F(v1 ©r1 v2 ©r2….vn ©rn v)
= F (r1v1 + r2v2 + … + rnvn + (1-r1-r2-…-rn) v)
ri = F(Conditions under which vi is read)
Example (Memory)M[x] = v6
M[y]=v
M[y+1]=v5
M[2z] = v4
M[2z+1] = v3
M[4z+3] = v2
M[2z+1] = v1
T1 = M[y]
T1 = M[y]
= v1 if (y == 2z+1)
v2 if (y != 2z+1 Æ y == 4z+3)
v4 if (y == 2z)
v otherwise
F(T1) = F(M[y])
= F(r1 v1 + r2 v2 + r4 v4 + (1 – r1 – r2 – r4) v)
where, r1 = F(y == 2z+1)
r2 = F(y != 2z+1 Æ y = 4z+3)
r4 = F(y == 2z)
Example (Memory)
M[y]=v
M[4z+3] = v2
M[2z+1] = v1
M[2z] = v4
T2 = M[y]
T2 = M[y]
= v4 if (y == 2z)
+ v1 if (y == 2z+1)
+ v2 if (y != 2z+1 Æ y == 4z+3)
+ v otherwiseF(T2) = F(M[y])
= F(r4 v4 + r1 v1 + r2 v2 + (1 – r4 – r1 – r2) v)
where, r4 = F(y == 2z)
r1 = F(y == 2z+1)
r2 = F(y != 2z+1 Æ y = 4z+3)
Example (Memory)M[x] = v6
M[y]=v
M[y+1]=v5
M[2z] = v4
M[2z+1] = v3
M[4z+3] = v2
M[2z+1] = v1
T1 = M[y]
M[y]=v
M[4z+3] = v2
M[2z+1] = v1
M[2z] = v4
T2 = M[y]
F(T1) = F(r1 v1 + r2 v2 + r4 v4 + (1 – r1 – r2 – r4) v)
F(T2) = F(r4 v4 + r1 v1 + r2 v2 + (1 – r4 – r1 – r2) v)
Applications Program Verification
Automatic discovery of useful loop invariants Translation Validation Compiler Optimizations
Eliminating redundant computations, branches, memory reads.
Partial Evaluation Interactive Debugging and Testing of Programs
Related Light-weight Techniques Value Numbering
Targets Structural Equivalence of expressions Detects only equalities
Random Testing Cannot ‘prove’ equivalence of expressions
can only provide a counter-example Exponential number of paths Even generating input data to execute a particular
path is difficult
Conclusion Comparison with Symbolic Analysis
very simple data structure: list of numbers with simple operations and fast judgements
There is a limit to what a linear time analysis can achieve! Excellent base to build up more complicated analysis
Join lazily
“The intriguing possibility that axioms of randomness may constitute a useful fundamental source of truth independent of, but supplementary to, the standard axiomatic structure of mathematics suggests that probabilistic algorithms ought to be sought vigorously.”
- J.T. Schwartz