data flow analysis foundations
DESCRIPTION
Data Flow Analysis Foundations. Guo, Yao. Part of the slides are adapted from MIT 6.035 “Computer Language Engineering”. Dataflow Analysis. Compile-Time Reasoning About Run-Time Values of Variables or Expressions At Different Program Points - PowerPoint PPT PresentationTRANSCRIPT
School of EECS, Peking University“Advanced Compiler Techniques” (Fall 2011)
Data Flow Data Flow AnalysisAnalysis
FoundationsFoundationsGuo, YaoGuo, Yao
Part of the slides are adapted from MIT 6.035 “Computer Language Engineering”
2Fall 2011“Advanced Compiler
Techniques”
Dataflow AnalysisDataflow Analysis Compile-Time Reasoning AboutCompile-Time Reasoning About
Run-Time Values of Variables or Run-Time Values of Variables or ExpressionsExpressions
At Different Program PointsAt Different Program Points Which assignment statements produced Which assignment statements produced
value of variable at this point?value of variable at this point? Which variables contain values that are Which variables contain values that are
no longer used after this program point?no longer used after this program point? What is the range of possible values of What is the range of possible values of
variable at this program point?variable at this program point?
3Fall 2011“Advanced Compiler
Techniques”
Program RepresentationProgram Representation Control Flow GraphControl Flow Graph
Nodes N Nodes N –– statements of program statements of program Edges E Edges E –– flow of control flow of control
pred(n) = set of all predecessors of npred(n) = set of all predecessors of n succ(n) = set of all successors of nsucc(n) = set of all successors of n
Start node nStart node n00 Set of final nodes NSet of final nodes Nfinalfinal
Could also add special nodes ENTRY & EXITCould also add special nodes ENTRY & EXIT
4Fall 2011“Advanced Compiler
Techniques”
Program PointsProgram Points One program point before each nodeOne program point before each node One program point after each nodeOne program point after each node Join point Join point –– point with multiple point with multiple
predecessorspredecessors Split point Split point –– point with multiple point with multiple
successorssuccessors
5Fall 2011“Advanced Compiler
Techniques”
Basic IdeaBasic Idea Information about program Information about program
represented using values from represented using values from algebraic structure called algebraic structure called latticelattice
Analysis produces lattice value for Analysis produces lattice value for each program pointeach program point
Two flavors of analysisTwo flavors of analysis Forward dataflow analysisForward dataflow analysis Backward dataflow analysisBackward dataflow analysis
6Fall 2011“Advanced Compiler
Techniques”
Forward Dataflow Forward Dataflow AnalysisAnalysis
Analysis propagates values forward through control Analysis propagates values forward through control flow graph with flow of controlflow graph with flow of control Each node has a transfer function fEach node has a transfer function f
Input Input –– value at program point before node value at program point before node Output Output –– new value at program point after node new value at program point after node
Values flow from program points after predecessor Values flow from program points after predecessor nodes to program points before successor nodesnodes to program points before successor nodes
At join points, values are combined using a merge At join points, values are combined using a merge function function
Canonical Example: Reaching DefinitionsCanonical Example: Reaching Definitions
7Fall 2011“Advanced Compiler
Techniques”
Backward Dataflow Backward Dataflow AnalysisAnalysis
Analysis propagates values backward through Analysis propagates values backward through control flow graph against flow of controlcontrol flow graph against flow of control Each node has a transfer function fEach node has a transfer function f
Input Input –– value at program point after node value at program point after node Output Output –– new value at program point before new value at program point before
nodenode Values flow from program points before Values flow from program points before
successor nodes to program points after successor nodes to program points after predecessor nodespredecessor nodes
At split points, values are combined using a At split points, values are combined using a merge functionmerge function
Canonical Example: Live VariablesCanonical Example: Live Variables
8Fall 2011“Advanced Compiler
Techniques”
Partial OrdersPartial Orders Set PSet P Partial order Partial order such that such that x,y,zx,y,zPP
x x x x (reflexive)(reflexive) x x y and y y and y x implies x x implies x y y (asymmetric)(asymmetric) x x y and y y and y z implies x z implies x z z (transitive)(transitive)
Can use partial order to defineCan use partial order to define Upper and lower boundsUpper and lower bounds Least upper boundLeast upper bound Greatest lower boundGreatest lower bound
9Fall 2011“Advanced Compiler
Techniques”
Upper BoundsUpper Bounds If S If S P then P then
xxP is an upper bound of S if P is an upper bound of S if yyS. y S. y xx
xxP is the least upper bound of S ifP is the least upper bound of S if x is an upper bound of S, and x is an upper bound of S, and x x y for all upper bounds y of S y for all upper bounds y of S
- join, least upper bound (lub), - join, least upper bound (lub), supremum, supsupremum, sup S is the least upper bound of SS is the least upper bound of S x x y is the least upper bound of {x,y} y is the least upper bound of {x,y}
10Fall 2011“Advanced Compiler
Techniques”
LowerLower BoundsBounds If S P then
xP is a lower bound of S if yS. x y xP is the greatest lower bound of S if
x is a lower bound of S, and y x for all lower bounds y of S
- meet, greatest lower bound (glb), infimum, inf
S is the greatest lower bound of S x y is the greatest lower bound of {x,y}
11Fall 2011“Advanced Compiler
Techniques”
CoveringCovering xx y if x y if x y and x y and xy y x is covered by y (y covers x) ifx is covered by y (y covers x) if
x x y, and y, and x x z z y implies x y implies x z z
Conceptually, y covers x if there are Conceptually, y covers x if there are no elements between x and yno elements between x and y
12Fall 2011“Advanced Compiler
Techniques”
ExampleExample P = { 000, 001, 010, 011, 100, 101, 110, 111}P = { 000, 001, 010, 011, 100, 101, 110, 111}
(standard Boolean lattice, also called hypercube)(standard Boolean lattice, also called hypercube) x x y if (x bitwise and y) = x y if (x bitwise and y) = x
111
011101
110
010001
000
100
Hasse Diagram• If y covers x
• Line from y to x• y above x in
diagram
13Fall 2011“Advanced Compiler
Techniques”
LatticesLattices If x If x y and x y and x y exist for all x,y y exist for all x,yP, P,
then P is a lattice.then P is a lattice. If If S and S and S exist for all S S exist for all S P, P,
then P is a complete lattice.then P is a complete lattice. All finite lattices are completeAll finite lattices are complete
14Fall 2011“Advanced Compiler
Techniques”
LatticesLattices If x If x y and x y and x y exist for all x,y y exist for all x,yP, P, then P then P
is a lattice.is a lattice. If If S and S and S exist for all S S exist for all S P, P, then P is a then P is a
complete lattice.complete lattice. All finite lattices are completeAll finite lattices are complete Example of a lattice that is not completeExample of a lattice that is not complete
Integers IIntegers I For any x, yFor any x, yI, x I, x y = max(x,y), x y = max(x,y), x y = min(x,y) y = min(x,y) But But I and I and I do not exist I do not exist I I { {,, } is a complete lattice } is a complete lattice
15Fall 2011“Advanced Compiler
Techniques”
Lattice ExamplesLattice Examples LatticesLattices
Non-latticesNon-lattices
16Fall 2011“Advanced Compiler
Techniques”
Semi-LatticeSemi-Lattice Only one of the two binary Only one of the two binary
operations (meet or join) existoperations (meet or join) exist Meet-semilatticeMeet-semilattice If x If x y exist for all x,y y exist for all x,yPP Join-semilattice Join-semilattice If x If x y exist for all x,y y exist for all x,yPP
17Fall 2011“Advanced Compiler
Techniques”
Top and BottomTop and Bottom Greatest element of P (if it exists) is Greatest element of P (if it exists) is
top (top (⊤⊤)) Least element of P (if it exists) is Least element of P (if it exists) is
bottom (bottom ())
18Fall 2011“Advanced Compiler
Techniques”
Connection Between Connection Between , , , , and and
The following 3 properties are equivalent:The following 3 properties are equivalent: x x y y x x y y y y x x y y x x
Will prove:Will prove: x x y implies x y implies x y y y and x y and x y y x x x x y y y implies x y implies x y y x x y y x implies x x implies x y y
Then by transitivity, can obtain Then by transitivity, can obtain x x y y y implies x y implies x y y x x x x y y x implies x x implies x y y y y
19Fall 2011“Advanced Compiler
Techniques”
Connecting Lemma Connecting Lemma ProofsProofs
Proof of x Proof of x y implies x y implies x y y y y x x y implies y is an upper bound of {x,y}. y implies y is an upper bound of {x,y}. Any upper bound z of {x,y} must satisfy y Any upper bound z of {x,y} must satisfy y z. z. So y is least upper bound of {x,y} and x So y is least upper bound of {x,y} and x y y y y
Proof of x Proof of x y implies x y implies x y y x x x x y implies x is a lower bound of {x,y}. y implies x is a lower bound of {x,y}. Any lower bound z of {x,y} must satisfy z Any lower bound z of {x,y} must satisfy z x. x. So x is greatest lower bound of {x,y} and x So x is greatest lower bound of {x,y} and x y y
x x
20Fall 2011“Advanced Compiler
Techniques”
Connecting Lemma Connecting Lemma ProofsProofs
Proof of x Proof of x y y y implies x y implies x y y y is an upper bound of {x,y} implies x y is an upper bound of {x,y} implies x
yy Proof of x Proof of x y y x implies x x implies x y y
x is a lower bound of {x,y} implies x x is a lower bound of {x,y} implies x y y
21Fall 2011“Advanced Compiler
Techniques”
Lattices as Algebraic Lattices as Algebraic StructuresStructures
Have defined Have defined and and in terms of in terms of Will now define Will now define in terms of in terms of and and
Start with Start with and and as arbitrary algebraic as arbitrary algebraic operations that satisfy associative, operations that satisfy associative, commutative, idempotence, and absorption commutative, idempotence, and absorption lawslaws
Will define Will define using using and and Will show that Will show that is a partial order is a partial order
Intuitive concept of Intuitive concept of and and as information as information combination operators (or, and)combination operators (or, and)
22Fall 2011“Advanced Compiler
Techniques”
Lattice as AlgebraLattice as Algebra定理定理 设 设 <<SS, , ∗, , ∗ ∘∘>> 是具有两个二元运算 和∗是具有两个二元运算 和∗ 的代数系统的代数系统 , , 若对于 和∗若对于 和∗ 满足交换律、结合满足交换律、结合律、吸收律律、吸收律 , , 则可以适当定义则可以适当定义 SS 中的偏序≼中的偏序≼ , , 使得 使得 <<SS, >≼, >≼ 构成格构成格 , , 且且 aa,,bb∈∈SS 有有 aa∧∧b b = = aa∗∗bb, , aa∨∨b b = = aa ∘∘ bb..定义定义 设 设 <<SS, ,∗, ,∗ ∘∘>> 是代数系统是代数系统 , ∗, ∗ 和和 ∘∘是二元是二元运算运算 , , 如果 和∗如果 和∗ ∘∘ 运算满足交换律、结合律运算满足交换律、结合律和吸收律和吸收律 , , 则则 <<SS, ,∗, ,∗ ∘∘>> 构成格构成格 ..
23Fall 2011“Advanced Compiler
Techniques”
Algebraic Properties of Algebraic Properties of LatticesLattices
Assume arbitrary operations Assume arbitrary operations and and such that such that (x (x y) y) z z x x (y (y z) z) (associativity of (associativity of )) (x (x y) y) z z x x (y (y z) z) (associativity of (associativity of )) x x y y y y x x (commutativity of (commutativity of )) x x y y y y x x (commutativity of (commutativity of )) x x x x x x (idempotence of (idempotence of )) x x x x x x (idempotence of (idempotence of )) x x (x (x y) y) x x (absorption of (absorption of over over )) x x (x (x y) y) x x (absorption of (absorption of over over ))
24Fall 2011“Advanced Compiler
Techniques”
Connection Between Connection Between and and
x x y y y if and only if x y if and only if x y y x x Proof of x Proof of x y y y implies x = x y implies x = x y y
x = x x = x (x (x y) y) (by absorption)(by absorption) = x = x y y (by assumption)(by assumption)
Proof of x Proof of x y y x implies y = x x implies y = x y yy = y y = y (y (y x) x) (by absorption)(by absorption) = y = y (x (x y) y) (by commutativity)(by commutativity) = y = y x x (by assumption)(by assumption) = x = x y y (by commutativity)(by commutativity)
25Fall 2011“Advanced Compiler
Techniques”
Properties of Properties of Define x Define x y if x y if x y y y y Proof of transitive property. Must Proof of transitive property. Must
show thatshow that x x y y y and y y and y z z z implies x z implies x z z zz
x x z = x z = x (y (y z) z) (by assumption)(by assumption) = (x = (x y) y) z z (by associativity)(by associativity) = y = y z z (by assumption)(by assumption) = z= z (by assumption)(by assumption)
26Fall 2011“Advanced Compiler
Techniques”
Properties of Properties of Proof of asymmetry property. Must show Proof of asymmetry property. Must show
thatthatx x y y y and y y and y x x x implies x x implies x y y
x = y x = y x x (by assumption)(by assumption) = x = x y y (by commutativity)(by commutativity) = y = y (by assumption)(by assumption)
Proof of reflexivity property. Must show Proof of reflexivity property. Must show thatthatx x x x x x x x x x x x (by idempotence)(by idempotence)
27Fall 2011“Advanced Compiler
Techniques”
Monotonic Function & Fixed Monotonic Function & Fixed pointpoint
Let Let LL be a lattice. be a lattice. A function A function ff : : L L → → LL is is monotonic ifmonotonic if
∀∀x, yx, y ∈ S : ∈ S : xx yy ⇒ ⇒ f f ((xx) ) f f ((yy))
Let Let AA be a set, be a set, ff : : AA → → AA a function, a function, aa ∈ ∈AA ..If If f f ((aa) = ) = aa, then , then aa is called a is called a fixed pointfixed point of of f f on on AA
28Fall 2011“Advanced Compiler
Techniques”
Existence of Fixed PointsExistence of Fixed Points• The The heightheight of a lattice is defined to of a lattice is defined to
be the length of the longest path be the length of the longest path from from ⊥⊥ to to ⊤⊤
• In a complete lattice In a complete lattice LL with finite with finite height, every monotonic function height, every monotonic function ff : : LL → → LL has a has a uniqueunique least fixed-pointleast fixed-point : :
fix fix ((f f ) = ) =
0( )i
if
29Fall 2011“Advanced Compiler
Techniques”
Another Existence Another Existence TheoremTheorem
A function A function ff : : LL → → LL on a complete on a complete lattice lattice LL is called continuous if for is called continuous if for any subset any subset XX of of LL, , f f ((XX) = ) = f f ((XX). ).
Let Let LL be a complete lattice. If be a complete lattice. If ff : : LL → → LL is monotonic and continuous, then is monotonic and continuous, then ff has a has a uniqueunique least fixed-pointleast fixed-point
fix fix ((f f ) = ) =
0( )i
if
30Fall 2011“Advanced Compiler
Techniques”
Knaster-Tarski Knaster-Tarski Fixed Point TheoremFixed Point Theorem
Suppose (L, Suppose (L, ) is a complete lattice, ) is a complete lattice, ff: L: LL is a monotonic function.L is a monotonic function.
Then the fixed point m of Then the fixed point m of ff can be can be defined asdefined as
31Fall 2011“Advanced Compiler
Techniques”
Calculating Fixed PointCalculating Fixed Point The time complexity of computing a The time complexity of computing a
fixed-point depends on three factors:fixed-point depends on three factors: The height of the lattice, since this The height of the lattice, since this
provides a bound for i;provides a bound for i; The cost of computing f;The cost of computing f; The cost of testing equality.The cost of testing equality.
The computation of a fixed-point can be illustrated as a walk up the lattice starting at ⊥:
32Fall 2011“Advanced Compiler
Techniques”
Fixed Point SolutionFixed Point SolutionLet L be a lattice with finite height. Let L be a lattice with finite height. An An equation systemequation system is of the form: is of the form:
xx11 = F = F11(x(x11, . . . , x, . . . , xnn))xx22 = F = F22(x(x11, . . . , x, . . . , xnn))......xxnn = F = Fnn(x(x11, . . . , x, . . . , xnn))where xwhere xii are variables and F are variables and Fii : L : Lnn → L is a → L is a
collection of monotone functions.collection of monotone functions.The solution can be obtained using the The solution can be obtained using the least least
fixed-pointfixed-point of the function F : L of the function F : Lnn → L → Lnn defined by:defined by:
F(xF(x11,...,x,...,xnn) = (F) = (F11(x(x11,...,x,...,xnn), ..., F), ..., Fnn(x(x11,...,x,...,xnn))))
33Fall 2011“Advanced Compiler
Techniques”
AlgorithmAlgorithm The naive algorithm is then to proceed as follows:
x = (⊥, . . . ,⊥);do { t = x; x = F(x); }while (x ≠ t);
to compute the fixed-point x.
A better algorithm:x1 = ⊥; . . . xn = ⊥;do {
t1 = x1; . . . tn = xn;x1 = F1(x1, . . . , xn);. . .xn = Fn(x1, . . . , xn);
} while (x1 ≠ t1 ∨ . . . ∨ xn ≠ tn) to compute the fixed-point (x1, . . . , xn).
Why Better?
34Fall 2011“Advanced Compiler
Techniques”
ChainsChains A set S is a A set S is a chainchain if if x,yx,yS. y S. y x or x x or x
y y P has no infinite chains if every chain in P has no infinite chains if every chain in
P is finiteP is finite P satisfies the P satisfies the ascending chainascending chain
conditioncondition if for all sequences x if for all sequences x11 x x22 ……there exists n, such that xthere exists n, such that xnn = x = xn+1n+1 = = ……
35Fall 2011“Advanced Compiler
Techniques”
Application to Dataflow Application to Dataflow AnalysisAnalysis
Dataflow information will be lattice Dataflow information will be lattice valuesvalues Transfer functions operate on lattice valuesTransfer functions operate on lattice values Solution algorithm will generate increasing Solution algorithm will generate increasing
sequence of values at each program pointsequence of values at each program point Ascending chain condition will ensure Ascending chain condition will ensure
terminationtermination Will use Will use to combine values at control- to combine values at control-
flow join pointsflow join points
36Fall 2011“Advanced Compiler
Techniques”
Transfer FunctionsTransfer Functions Transfer function Transfer function ff: P: PP for each P for each
node in control flow graphnode in control flow graph ff models effect of the node on the models effect of the node on the
program informationprogram information
37Fall 2011“Advanced Compiler
Techniques”
Transfer FunctionsTransfer FunctionsEach dataflow analysis problem has a Each dataflow analysis problem has a
set F of transfer functions f: Pset F of transfer functions f: PPP Identity function iIdentity function iFF F must be closed under composition: F must be closed under composition:
f,gf,gF. the function h = F. the function h = x.f(g(x)) x.f(g(x)) FF Each f Each f F must be monotone:F must be monotone:
x x y implies f(x) y implies f(x) f(y) f(y) Sometimes all fSometimes all fF are distributive: F are distributive:
f(x f(x y) = f(x) y) = f(x) f(y) f(y) Distributivity implies monotonicityDistributivity implies monotonicity
38Fall 2011“Advanced Compiler
Techniques”
Distributivity Implies Distributivity Implies MonotonicityMonotonicity
Proof of distributivity implies Proof of distributivity implies monotonicitymonotonicity
Assume f(x Assume f(x y) = f(x) y) = f(x) f(y) f(y) Must show: x Must show: x y = y implies f(x) y = y implies f(x) f(y) f(y)
= f(y)= f(y)f(y) = f(x f(y) = f(x y) y) (by assumption)(by assumption) = f(x) = f(x) f(y) f(y) (by distributivity)(by distributivity)
39Fall 2011“Advanced Compiler
Techniques”
Putting Pieces TogetherPutting Pieces Together Forward Dataflow Analysis Forward Dataflow Analysis
FrameworkFramework Simulates execution of program Simulates execution of program
forward with flow of controlforward with flow of control
40Fall 2011“Advanced Compiler
Techniques”
Forward Dataflow Forward Dataflow AnalysisAnalysis
Simulates execution of program forward with Simulates execution of program forward with flow of controlflow of control
For each node n, haveFor each node n, have ininnn –– value at program point before n value at program point before n outoutnn –– value at program point after n value at program point after n ffnn –– transfer function for n (given in transfer function for n (given innn, computes , computes
outoutnn)) Require that solution satisfyRequire that solution satisfy
n. outn. outnn = f = fnn(in(innn)) n n n n00. in. inn n = = { out { outmm . m in pred(n) } . m in pred(n) } ininn0n0 = I = I
Where I summarizes information at start of programWhere I summarizes information at start of program
41Fall 2011“Advanced Compiler
Techniques”
Dataflow EquationsDataflow Equations Compiler processes program to Compiler processes program to
obtain a set of dataflow equationsobtain a set of dataflow equationsoutoutnn := f := fnn(in(innn))
ininnn := := { out { outmm . m in pred(n) } . m in pred(n) } Conceptually separates analysis Conceptually separates analysis
problem from programproblem from program
42Fall 2011“Advanced Compiler
Techniques”
Worklist Algorithm for Worklist Algorithm for Solving Forward Dataflow Solving Forward Dataflow
EquationsEquationsfor each n do outfor each n do outnn := f := fnn(())ininn0n0 := I; out := I; outn0 n0 := f:= fn0n0(I)(I)worklist := N - { nworklist := N - { n00 } }while worklist while worklist do do
remove a node n from worklistremove a node n from worklistininnn := := { out { outmm . m in pred(n) } . m in pred(n) }outoutnn := f := fnn(in(innn))if outif outnn changed then changed then worklist := worklist worklist := worklist succ(n) succ(n)
43Fall 2011“Advanced Compiler
Techniques”
Correctness ArgumentCorrectness Argument Why result satisfies dataflow equationsWhy result satisfies dataflow equations Whenever process a node Whenever process a node nn, set , set
outoutnn := f := fnn(in(innn) ) Algorithm ensures that outAlgorithm ensures that outnn = f = fnn(in(innn) )
Whenever outWhenever outm m changes, put succ(m) on changes, put succ(m) on worklist. Consider any node n worklist. Consider any node n succ(m). It will succ(m). It will eventually come off worklist and algorithm will eventually come off worklist and algorithm will set set ininnn := := { out { outmm . m in pred(n) } . m in pred(n) } to ensure to ensure that that ininnn = = { out { outmm . m in pred(n) } . m in pred(n) }
So final solution will satisfy dataflow equations So final solution will satisfy dataflow equations
44Fall 2011“Advanced Compiler
Techniques”
Termination ArgumentTermination Argument Why does algorithm terminate?Why does algorithm terminate? Sequence of values taken on by inSequence of values taken on by innn or or
outoutn n is a chain. If values stop increasing, is a chain. If values stop increasing, worklist empties and algorithm worklist empties and algorithm terminates.terminates.
If lattice has ascending chain property, If lattice has ascending chain property, algorithm terminatesalgorithm terminates Algorithm terminates for finite latticesAlgorithm terminates for finite lattices For lattices without ascending chain property, For lattices without ascending chain property,
use widening operatoruse widening operator
45Fall 2011“Advanced Compiler
Techniques”
Reaching DefinitionsReaching Definitions P = powerset of set of all definitions in program P = powerset of set of all definitions in program
(all subsets of set of definitions in program)(all subsets of set of definitions in program) = = (order is (order is )) = = I = inI = inn0n0 = = F = all functions f of the form f(x) = a F = all functions f of the form f(x) = a (x-b) (x-b)
b is set of definitions that node killsb is set of definitions that node kills a is set of definitions that node generatesa is set of definitions that node generates
General pattern for many transfer functionsGeneral pattern for many transfer functions f(x) = GEN f(x) = GEN (x-KILL) (x-KILL)
46Fall 2011“Advanced Compiler
Techniques”
Does Reaching Definitions Does Reaching Definitions Framework Satisfy Framework Satisfy
Properties?Properties? satisfies conditions for satisfies conditions for x x y and y y and y z implies x z implies x z (transitivity) z (transitivity) x x y and y y and y x implies y = x (asymmetry) x implies y = x (asymmetry) x x x x (idempotence) (idempotence)
F satisfies transfer function conditionsF satisfies transfer function conditions x.x. (x- (x- ) = ) = x.xx.xF F (identity)(identity) Will show f(x Will show f(x y) = f(x) y) = f(x) f(y) f(y) (distributivity)(distributivity)
f(x) f(x) f(y) = (a f(y) = (a (x (x –– b)) b)) (a (a (y (y –– b)) b)) = a = a (x (x –– b) b) (y (y –– b) = a b) = a ((x ((x y) y) ––
b)b) = f(x = f(x y) y)
47Fall 2011“Advanced Compiler
Techniques”
Does Reaching Definitions Does Reaching Definitions Framework Satisfy Framework Satisfy
Properties?Properties? What about composition?What about composition?
Given fGiven f11(x) = a(x) = a11 (x-b (x-b11) and f) and f22(x) = a(x) = a22 (x-b (x-b22)) Must show fMust show f11(f(f22(x)) can be expressed as a (x)) can be expressed as a (x (x
- b)- b)ff11(f(f22(x)) = a(x)) = a11 ((a ((a22 (x-b (x-b22)) - b)) - b11)) = a= a11 ((a ((a22 - b - b11) ) ((x-b ((x-b22) - b) - b11)))) = (a= (a11 (a (a22 - b - b11)) )) ((x-b ((x-b22) - b) - b11)))) = (a= (a11 (a (a22 - b - b11)) )) (x-(b (x-(b22 b b11))))
Let a = (aLet a = (a11 (a (a22 - b - b11)) and b = b)) and b = b22 b b11 Then fThen f11(f(f22(x)) = a (x)) = a (x (x –– b) b)
48Fall 2011“Advanced Compiler
Techniques”
Available ExpressionsAvailable Expressions P = powerset of set of all expressions in P = powerset of set of all expressions in
program (all subsets of set of expressions)program (all subsets of set of expressions) = = (order is (order is )) = P = P I = inI = inn0n0 = = F = all functions f of the form f(x) = a F = all functions f of the form f(x) = a
(x-b)(x-b) b is set of expressions that node killsb is set of expressions that node kills a is set of expressions that node generatesa is set of expressions that node generates
Another GEN/KILL analysisAnother GEN/KILL analysis
49Fall 2011“Advanced Compiler
Techniques”
Concept of ConservatismConcept of Conservatism Reaching definitions use Reaching definitions use as join as join
Optimizations must take into account all Optimizations must take into account all definitions that reach along ANY pathdefinitions that reach along ANY path
Available expressions use Available expressions use as join as join Optimization requires expression to Optimization requires expression to
reach along ALL pathsreach along ALL paths Optimizations must conservatively Optimizations must conservatively
take all possible executions into take all possible executions into account. account.
50Fall 2011“Advanced Compiler
Techniques”
Backward Dataflow Analysis
Simulates execution of program backward against the flow of control
For each node n, have inn – value at program point before n outn – value at program point after n fn – transfer function for n (given outn, computes
inn) Require that solution satisfies
n. inn = fn(outn) n Nfinal. outn = { inm . m in succ(n) } n Nfinal = outn = O Where O summarizes information at end of
program
51Fall 2011“Advanced Compiler
Techniques”
Worklist Algorithm for Solving Backward Dataflow
Equationsfor each n do inn := fn()for each n Nfinal do outn := O; inn := fn(O)worklist := N - Nfinal while worklist do
remove a node n from worklistoutn := { inm . m in succ(n) }inn := fn(outn)if inn changed then worklist := worklist pred(n)
52Fall 2011“Advanced Compiler
Techniques”
Live VariablesLive Variables P = powerset of set of all variables in P = powerset of set of all variables in
program (all subsets of set of variables in program (all subsets of set of variables in program)program)
= = (order is (order is )) = = O = O = F = all functions f of the form f(x) = a F = all functions f of the form f(x) = a
(x-b)(x-b) b is set of variables that node killsb is set of variables that node kills a is set of variables that node readsa is set of variables that node reads
53Fall 2011“Advanced Compiler
Techniques”
Very Busy ExpressionsVery Busy Expressions An expression is An expression is very busyvery busy if it will if it will definitelydefinitely
be evaluated again before its value changes.be evaluated again before its value changes. Also called “anticipated” expressionsAlso called “anticipated” expressions
We need the same lattice and auxiliary We need the same lattice and auxiliary functions as for available expressions.functions as for available expressions.
For every CFG node v the variable [v] For every CFG node v the variable [v] denotes the set of expressions that at the denotes the set of expressions that at the program point before the node definitely are program point before the node definitely are busy.busy.
54Fall 2011“Advanced Compiler
Techniques”
Formulate the AnalysisFormulate the Analysis P = powerset of set of all expressions in P = powerset of set of all expressions in
program (all subsets of set of expressions)program (all subsets of set of expressions) = = (order is (order is )) = P = P I = inI = inExitExit = = F = all functions f of the form f(x) = a F = all functions f of the form f(x) = a (x- (x-
b)b) b is set of expressions that node killsb is set of expressions that node kills a is set of expressions that node evaluatesa is set of expressions that node evaluates
55Fall 2011“Advanced Compiler
Techniques”
ExampleExamplevar x,a,b;var x,a,b;x = input;x = input;a = x-1;a = x-1;b = x-2;b = x-2;while (x>0) {while (x>0) {
output output a*ba*b-x;-x;x = x-1;x = x-1;
}}output a*b;output a*b;
var x,a,b,atimesb;x = input;a = x-1;b = x-2;atimesb = a*b;while (x>0) {
output atimesb-x;
x = x-1;}
output atimesb;
56Fall 2011“Advanced Compiler
Techniques”
Comparisons of Four Comparisons of Four AnalysesAnalyses
Reaching DefinitionsReaching Definitions Available ExpressionsAvailable Expressions Live VariablesLive Variables Very Busy ExpressionsVery Busy Expressions
57Fall 2011“Advanced Compiler
Techniques”
Forwards vs. backwardsForwards vs. backwards A A forwardsforwards analysis is one that for each analysis is one that for each
program point computes information program point computes information about the about the pastpast behavior. behavior. Examples of this are available expressions and Examples of this are available expressions and
reaching definitions.reaching definitions. Calculation: Calculation: predecessorspredecessors of CFG nodes. of CFG nodes.
A A backwardsbackwards analysis is one that for each analysis is one that for each program point computes information program point computes information about the about the futurefuture behavior. behavior. Examples of this are liveness and very busy Examples of this are liveness and very busy
expressions.expressions. Calculation: Calculation: successorssuccessors of CFG nodes. of CFG nodes.
58Fall 2011“Advanced Compiler
Techniques”
May vs. MustMay vs. Must A A maymay analysis is one that describes analysis is one that describes
information that may possibly be true and, information that may possibly be true and, thus, computes an thus, computes an upperupper approximation. approximation. Examples of this are liveness and reaching Examples of this are liveness and reaching
definitions.definitions. Calculation: Calculation: unionunion operator. operator.
A A mustmust analysis is one that describes analysis is one that describes information that must definitely be true information that must definitely be true and, thus, computes a and, thus, computes a lower lower approximation. approximation. Examples of this are available expressions and Examples of this are available expressions and
very busy expressions.very busy expressions. Calculation: Calculation: intersectionintersection operator. operator.
59Fall 2011“Advanced Compiler
Techniques”
Classification of AnalysesClassification of Analyses
ForwardsForwards BackwardsBackwards
MayMay Reaching Reaching DefinitionsDefinitions Live VariablesLive Variables
MustMust Available Available ExpressionsExpressions
Very Busy Very Busy ExpressionsExpressions
60Fall 2011“Advanced Compiler
Techniques”
Example: Initialization Example: Initialization Try define an analysis that ensures
that variables are initialized before they are read.
Lattice? Powerset of variables in the given
program Forwards or Backwards?
Forwards May or Must?
MustMust
61Fall 2011“Advanced Compiler
Techniques”
Meaning of Dataflow Meaning of Dataflow ResultsResults
Concept of program state s for control-flow graphsConcept of program state s for control-flow graphs• Program point n where execution located Program point n where execution located
(n is node that will execute next) (n is node that will execute next)• Values of variables in programValues of variables in program
Each execution generates a trajectory of states:Each execution generates a trajectory of states: ss00;s;s11;;……;s;skk,where each s,where each sii STST ssi+1i+1 generated from s generated from si i by executing basic block to by executing basic block to
Update variable valuesUpdate variable values Obtain new program point nObtain new program point n
62Fall 2011“Advanced Compiler
Techniques”
Relating States to Analysis Result
Meaning of analysis results is given by an abstraction function AF:STP
Correctness condition: require that for all states s
AF(s) inn where n is the next statement to
execute in state s
63Fall 2011“Advanced Compiler
Techniques”
Sign Analysis ExampleSign Analysis Example Sign analysis - compute sign of each Sign analysis - compute sign of each
variable vvariable v Base Lattice: P = flat lattice on {-,0,+}Base Lattice: P = flat lattice on {-,0,+}
Actual lattice records a value for each Actual lattice records a value for each variablevariable
Example element: [aExample element: [a+, b+, b0, c0, c-]-]
- 0 +
TOP
BOT
64Fall 2011“Advanced Compiler
Techniques”
Interpretation of Lattice Interpretation of Lattice ValuesValues
If value of v in lattice is:If value of v in lattice is: BOT: no information about sign of vBOT: no information about sign of v -: variable v is negative-: variable v is negative 0: variable v is 0 0: variable v is 0 +: variable v is positive+: variable v is positive TOP: v may be positive or negativeTOP: v may be positive or negative
What is abstraction function AF?What is abstraction function AF? AF([xAF([x11,,……,x,xnn]) = [sign(x]) = [sign(x11), ), ……, sign(x, sign(xnn)])] Where sign(x) = 0 if x = 0, + if x > 0, - if x < 0Where sign(x) = 0 if x = 0, + if x > 0, - if x < 0
65Fall 2011“Advanced Compiler
Techniques”
Transfer FunctionsTransfer Functions If n of the form v = cIf n of the form v = c
ffnn(x) = x[v(x) = x[v+] if c is positive+] if c is positive ffnn(x) = x[v(x) = x[v0] if c is 00] if c is 0 ffnn(x) = x[v(x) = x[v-] if c is negative-] if c is negative
If n of the form vIf n of the form v11 = v = v22*v*v33 ffnn(x) = x[v(x) = x[v11x[vx[v22] ] x[v x[v33]]]]
I = TOP I = TOP (uninitialized variables may have any sign)(uninitialized variables may have any sign)
66Fall 2011“Advanced Compiler
Techniques”
Operation Operation on Lattice on Lattice BOT - 0 + TOP
BOT BOT BOT 0 BOT BOT
- BOT + 0 - TOP
0 0 0 0 0 0
+ BOT - 0 + TOP
TOP BOT TOP 0 TOP TOP
67Fall 2011“Advanced Compiler
Techniques”
Example
b = -1 b = 1
a = 1[a+][a+]
[a+, b+][a+, b-]
[a+, bTOP]c = a*b
[a+, bTOP,c TOP]
68Fall 2011“Advanced Compiler
Techniques”
Imprecision In ExampleImprecision In Example
b = -1 b = 1
a = 1[a+][a+]
[a+, b+][a+, b-]
[a+, bTOP]c = a*b
Abstraction Imprecision:[a1] abstracted as [a+]
Control Flow Imprecision:[bTOP] summarizes results of all executions. In any execution state s, AF(s)[b]TOP
69Fall 2011“Advanced Compiler
Techniques”
General Sources of General Sources of ImprecisionImprecision Abstraction ImprecisionAbstraction Imprecision
Concrete values (integers) abstracted as lattice Concrete values (integers) abstracted as lattice values (-,0, and +)values (-,0, and +)
Lattice values less precise than execution valuesLattice values less precise than execution values Abstraction function throws away informationAbstraction function throws away information
Control Flow ImprecisionControl Flow Imprecision One lattice value for all possible control flow pathsOne lattice value for all possible control flow paths Analysis result has a single lattice value to summarize Analysis result has a single lattice value to summarize
results of multiple concrete executionsresults of multiple concrete executions Join operation Join operation moves up in lattice to combine values moves up in lattice to combine values
from different execution pathsfrom different execution paths Typically if x Typically if x y, then x is more precise than y y, then x is more precise than y
70Fall 2011“Advanced Compiler
Techniques”
Why Have ImprecisionWhy Have Imprecision Make analysis tractableMake analysis tractable Unbounded sets of values in Unbounded sets of values in
executionexecution Typically abstracted by finite set of Typically abstracted by finite set of
lattice valueslattice values Execution may visit unbounded set of Execution may visit unbounded set of
statesstates Abstracted by computing joins of Abstracted by computing joins of
different pathsdifferent paths
71Fall 2011“Advanced Compiler
Techniques”
Abstraction FunctionAbstraction Function AF(s)[v] = sign of vAF(s)[v] = sign of v
AF(n,[aAF(n,[a5, b5, b0, c0, c-2]) = [a-2]) = [a+, b+, b0, c0, c-]-] Establishes meaning of the analysis Establishes meaning of the analysis
resultsresults If analysis says variable has a given signIf analysis says variable has a given sign Always has that sign in actual executionAlways has that sign in actual execution
Correctness condition: Correctness condition: v. AF(s)[v] v. AF(s)[v] in innn[v] (n is node for s)[v] (n is node for s) Reflects possibility of imprecisionReflects possibility of imprecision
72Fall 2011“Advanced Compiler
Techniques”
Abstraction Function Abstraction Function SoundnessSoundness
Will showWill show v. AF(s)[v] v. AF(s)[v] in innn[v] (n [v] (n is node for s)is node for s)by induction on length of computation that by induction on length of computation that
produced sproduced s Base case:Base case:
v. inv. inn0n0[v] = TOP, which implies [v] = TOP, which implies thatthat
v. AF(s)[v] v. AF(s)[v] TOP TOP
73Fall 2011“Advanced Compiler
Techniques”
Induction StepInduction Step Assume Assume v. AF(s)[v] v. AF(s)[v] in innn[v] for computations of length k[v] for computations of length k Prove for computations of length k+1Prove for computations of length k+1 Proof:Proof:
Given s (state), n (node to execute next), and inGiven s (state), n (node to execute next), and innn Find p (the node that just executed), sFind p (the node that just executed), spp(the previous state), (the previous state),
and inand inpp By induction hypothesis By induction hypothesis v. AF(s v. AF(spp)[v] )[v] in inpp[v][v] Case analysis on form of nCase analysis on form of n
If n of the form v = c, then If n of the form v = c, then s[v] = c and outs[v] = c and outp p [v] = sign(c), so[v] = sign(c), so AF(s)[v] AF(s)[v]
= sign(c) = out= sign(c) = outp p [v][v] in innn[v][v] If xIf xv, s[x] = sv, s[x] = sp p [x] and out[x] and outp p [x] = in[x] = inpp[x], so[x], so AF(s)AF(s)
[x] = AF(s[x] = AF(spp)[x] )[x] in inpp[x] = out[x] = outp p [x] [x] in innn[x][x] Similar reasoning if n of the form vSimilar reasoning if n of the form v11 = v = v22*v*v33
74Fall 2011“Advanced Compiler
Techniques”
Augmented Execution Augmented Execution StatesStates
Abstraction functions for some Abstraction functions for some analyses require augmented analyses require augmented execution statesexecution states Reaching definitions: states are Reaching definitions: states are
augmented with definition that created augmented with definition that created each valueeach value
Available expressions: states are Available expressions: states are augmented with expression for each augmented with expression for each valuevalue
75Fall 2011“Advanced Compiler
Techniques”
Meet Over Paths Meet Over Paths SolutionSolution
What solution would be ideal for a forward What solution would be ideal for a forward dataflow analysis problem? dataflow analysis problem?
Consider a path p = nConsider a path p = n00, n, n11, , ……, n, nkk, n to a node n , n to a node n (note that for all i n(note that for all i nii pred(n pred(ni+1i+1))))
The solution must take this path into account:The solution must take this path into account:ffpp ( () = (f) = (fnknk(f(fnk-1nk-1((……ffn1n1(f(fn0n0(()) )) ……)) )) in innn
So the solution must have the property that So the solution must have the property that {f{fpp ( () . p is a path to n} ) . p is a path to n} in innn
and ideallyand ideally
{f{fpp ( () . p is a path to n} = in) . p is a path to n} = innn
76Fall 2011“Advanced Compiler
Techniques”
Soundness Proof of Soundness Proof of Analysis AlgorithmAnalysis Algorithm
Property to prove:Property to prove:For all paths p to n, fFor all paths p to n, fpp ( () ) in innn
Proof is by induction on length of pProof is by induction on length of p Uses monotonicity of transfer functionsUses monotonicity of transfer functions Uses following lemmaUses following lemma
Lemma:Lemma:Worklist algorithm produces a solution such thatWorklist algorithm produces a solution such that
ffnn(in(innn) = out) = outnn
if n if n pred(m) then out pred(m) then outnn in inmm
77Fall 2011“Advanced Compiler
Techniques”
ProofProof Base case: p is of length 1Base case: p is of length 1
Then p = nThen p = n0 0 and fand fpp(() = ) = = in = inn0n0
Induction step:Induction step: Assume theorem for all paths of length kAssume theorem for all paths of length k Show for an arbitrary path p of length Show for an arbitrary path p of length
k+1k+1
78Fall 2011“Advanced Compiler
Techniques”
Induction Step ProofInduction Step Proof p = np = n00, , ……, n, nkk, n, n Must show fMust show fkk(f(fk-1k-1((……ffn1n1(f(fn0n0(()) )) ……)) )) in innn
By induction (fBy induction (fk-1k-1((……ffn1n1(f(fn0n0(()) )) ……)) )) in innknk Apply fApply fk k to both sides, by monotonicity we to both sides, by monotonicity we
getget ffkk(f(fk-1k-1((……ffn1n1(f(fn0n0(()) )) ……)) )) f fkk(in(innknk) ) By lemma, fBy lemma, fkk(in(innknk) = out) = outnknk By lemma, outBy lemma, outnk nk in innn By transitivity, fBy transitivity, fkk(f(fk-1k-1((……ffn1n1(f(fn0n0(()) )) ……)) )) in innn
79Fall 2011“Advanced Compiler
Techniques”
DistributivityDistributivity Distributivity preserves precisionDistributivity preserves precision If framework is distributive, then If framework is distributive, then
worklist algorithm produces the meet worklist algorithm produces the meet over paths solutionover paths solution For all n:For all n:
{f{fpp ( () . p is a path to n} = in) . p is a path to n} = innn
80Fall 2011“Advanced Compiler
Techniques”
Lack of Distributivity Lack of Distributivity ExampleExample
Constant CalculatorConstant Calculator Flat Lattice on IntegersFlat Lattice on Integers
Actual lattice records a value for each Actual lattice records a value for each variablevariable Example element: [aExample element: [a3, b3, b2, c2, c5]5]
-1 10
TOP
BOT
-2 2 ……
81Fall 2011“Advanced Compiler
Techniques”
Transfer FunctionsTransfer Functions If n of the form v = cIf n of the form v = c
ffnn(x) = x[v(x) = x[vc]c] If n of the form vIf n of the form v11 = v = v22+v+v33
ffnn(x) = x[v(x) = x[v11x[vx[v22] + x[v] + x[v33]]]] Lack of distributivityLack of distributivity
Consider transfer function f for c = a + bConsider transfer function f for c = a + b f([af([a3, b3, b2]) 2]) f([a f([a2, b2, b3]) = [a3]) = [aTOP, bTOP, bTOP, cTOP, c5]5] f([af([a3, b3, b2]2][a[a2, b2, b3]) = f([a3]) = f([aTOP, bTOP, bTOP]) = TOP]) =
[a[aTOP, bTOP, bTOP, cTOP, cTOP]TOP]
82Fall 2011“Advanced Compiler
Techniques”
Lack of Distributivity Lack of Distributivity AnomalyAnomaly
a = 2b = 3
a = 3b = 2
[a3, b2][a2, b3]
[aTOP, bTOP]c = a+b
[aTOP, bTOP, c TOP]
Lack of Distributivity Imprecision: [aTOP, bTOP, c5] more precise
What is the meet over all paths solution?
83Fall 2011“Advanced Compiler
Techniques”
How to Make Analysis How to Make Analysis DistributiveDistributive
Keep combinations of values on different Keep combinations of values on different pathspaths
a = 2b = 3
a = 3b = 2
{[a3, b2]}{[a2, b3]}
{[a2, b3], [a3, b2]} c = a+b
{[a2, b3,c5], [a3, b2,c5]}
84Fall 2011“Advanced Compiler
Techniques”
IssuesIssues Basically simulating all combinations of Basically simulating all combinations of
values in all executionsvalues in all executions Exponential blowupExponential blowup Nontermination because of infinite Nontermination because of infinite
ascending chainsascending chains Nontermination solutionNontermination solution
Use widening operator to eliminate blowup Use widening operator to eliminate blowup (can make it work at granularity of (can make it work at granularity of variables)variables)
Loses precision in many casesLoses precision in many cases
85Fall 2011“Advanced Compiler
Techniques”
SummarySummary Formal dataflow analysis frameworkFormal dataflow analysis framework
Lattices, partial ordersLattices, partial orders Transfer functions, joins and splitsTransfer functions, joins and splits Dataflow equations and fixed point solutionsDataflow equations and fixed point solutions
Connection with programConnection with program Abstraction function AF: S Abstraction function AF: S P P For any state s and program point n, AF(s) For any state s and program point n, AF(s)
ininnn Meet over all paths solutions, distributivityMeet over all paths solutions, distributivity
86Fall 2011“Advanced Compiler
Techniques”
Next TimeNext Time Partial Redundancy EliminationPartial Redundancy Elimination
DragonBook DragonBook §9.59.5