data flow analysis foundations

School of EECS, Peking University“Advanced Compiler Techniques” (Fall 2011)

Data Flow Data Flow AnalysisAnalysis

FoundationsFoundationsGuo, YaoGuo, Yao

Part of the slides are adapted from MIT 6.035 “Computer Language Engineering”

2Fall 2011“Advanced Compiler

Techniques”

Dataflow AnalysisDataflow Analysis Compile-Time Reasoning AboutCompile-Time Reasoning About

Run-Time Values of Variables or Run-Time Values of Variables or ExpressionsExpressions

At Different Program PointsAt Different Program Points Which assignment statements produced Which assignment statements produced

value of variable at this point?value of variable at this point? Which variables contain values that are Which variables contain values that are

no longer used after this program point?no longer used after this program point? What is the range of possible values of What is the range of possible values of

variable at this program point?variable at this program point?


Techniques”

Program RepresentationProgram Representation Control Flow GraphControl Flow Graph

Nodes N Nodes N –– statements of program statements of program Edges E Edges E –– flow of control flow of control

pred(n) = set of all predecessors of npred(n) = set of all predecessors of n succ(n) = set of all successors of nsucc(n) = set of all successors of n

Start node nStart node n00 Set of final nodes NSet of final nodes Nfinalfinal

Could also add special nodes ENTRY & EXITCould also add special nodes ENTRY & EXIT


Techniques”

Program PointsProgram Points One program point before each nodeOne program point before each node One program point after each nodeOne program point after each node Join point Join point –– point with multiple point with multiple

predecessorspredecessors Split point Split point –– point with multiple point with multiple

successorssuccessors


Techniques”

Basic IdeaBasic Idea Information about program Information about program

represented using values from represented using values from algebraic structure called algebraic structure called latticelattice

Analysis produces lattice value for Analysis produces lattice value for each program pointeach program point

Two flavors of analysisTwo flavors of analysis Forward dataflow analysisForward dataflow analysis Backward dataflow analysisBackward dataflow analysis


Techniques”

Forward Dataflow Forward Dataflow AnalysisAnalysis

Analysis propagates values forward through control Analysis propagates values forward through control flow graph with flow of controlflow graph with flow of control Each node has a transfer function fEach node has a transfer function f

Input Input –– value at program point before node value at program point before node Output Output –– new value at program point after node new value at program point after node

Values flow from program points after predecessor Values flow from program points after predecessor nodes to program points before successor nodesnodes to program points before successor nodes

At join points, values are combined using a merge At join points, values are combined using a merge function function

Canonical Example: Reaching DefinitionsCanonical Example: Reaching Definitions


Techniques”

Backward Dataflow Backward Dataflow AnalysisAnalysis

Analysis propagates values backward through Analysis propagates values backward through control flow graph against flow of controlcontrol flow graph against flow of control Each node has a transfer function fEach node has a transfer function f

Input Input –– value at program point after node value at program point after node Output Output –– new value at program point before new value at program point before

nodenode Values flow from program points before Values flow from program points before

successor nodes to program points after successor nodes to program points after predecessor nodespredecessor nodes

At split points, values are combined using a At split points, values are combined using a merge functionmerge function

Canonical Example: Live VariablesCanonical Example: Live Variables


Techniques”

Partial OrdersPartial Orders Set PSet P Partial order Partial order such that such that x,y,zx,y,zPP

x x x x (reflexive)(reflexive) x x y and y y and y x implies x x implies x y y (asymmetric)(asymmetric) x x y and y y and y z implies x z implies x z z (transitive)(transitive)

Can use partial order to defineCan use partial order to define Upper and lower boundsUpper and lower bounds Least upper boundLeast upper bound Greatest lower boundGreatest lower bound


Techniques”

Upper BoundsUpper Bounds If S If S P then P then

xxP is an upper bound of S if P is an upper bound of S if yyS. y S. y xx

xxP is the least upper bound of S ifP is the least upper bound of S if x is an upper bound of S, and x is an upper bound of S, and x x y for all upper bounds y of S y for all upper bounds y of S

- join, least upper bound (lub), - join, least upper bound (lub), supremum, supsupremum, sup S is the least upper bound of SS is the least upper bound of S x x y is the least upper bound of {x,y} y is the least upper bound of {x,y}


Techniques”

LowerLower BoundsBounds If S P then

xP is a lower bound of S if yS. x y xP is the greatest lower bound of S if

x is a lower bound of S, and y x for all lower bounds y of S

- meet, greatest lower bound (glb), infimum, inf

S is the greatest lower bound of S x y is the greatest lower bound of {x,y}


Techniques”

CoveringCovering xx y if x y if x y and x y and xy y x is covered by y (y covers x) ifx is covered by y (y covers x) if

x x y, and y, and x x z z y implies x y implies x z z

Conceptually, y covers x if there are Conceptually, y covers x if there are no elements between x and yno elements between x and y


Techniques”

ExampleExample P = { 000, 001, 010, 011, 100, 101, 110, 111}P = { 000, 001, 010, 011, 100, 101, 110, 111}

(standard Boolean lattice, also called hypercube)(standard Boolean lattice, also called hypercube) x x y if (x bitwise and y) = x y if (x bitwise and y) = x

111

011101

110

010001

000

100

Hasse Diagram• If y covers x

• Line from y to x• y above x in

diagram


Techniques”

LatticesLattices If x If x y and x y and x y exist for all x,y y exist for all x,yP, P,

then P is a lattice.then P is a lattice. If If S and S and S exist for all S S exist for all S P, P,

then P is a complete lattice.then P is a complete lattice. All finite lattices are completeAll finite lattices are complete


Techniques”

LatticesLattices If x If x y and x y and x y exist for all x,y y exist for all x,yP, P, then P then P

is a lattice.is a lattice. If If S and S and S exist for all S S exist for all S P, P, then P is a then P is a

complete lattice.complete lattice. All finite lattices are completeAll finite lattices are complete Example of a lattice that is not completeExample of a lattice that is not complete

Integers IIntegers I For any x, yFor any x, yI, x I, x y = max(x,y), x y = max(x,y), x y = min(x,y) y = min(x,y) But But I and I and I do not exist I do not exist I I { {,, } is a complete lattice } is a complete lattice


Techniques”

Lattice ExamplesLattice Examples LatticesLattices

Non-latticesNon-lattices


Techniques”

Semi-LatticeSemi-Lattice Only one of the two binary Only one of the two binary

operations (meet or join) existoperations (meet or join) exist Meet-semilatticeMeet-semilattice If x If x y exist for all x,y y exist for all x,yPP Join-semilattice Join-semilattice If x If x y exist for all x,y y exist for all x,yPP


Techniques”

Top and BottomTop and Bottom Greatest element of P (if it exists) is Greatest element of P (if it exists) is

top (top (⊤⊤)) Least element of P (if it exists) is Least element of P (if it exists) is

bottom (bottom ())


Techniques”

Connection Between Connection Between , , , , and and

The following 3 properties are equivalent:The following 3 properties are equivalent: x x y y x x y y y y x x y y x x

Will prove:Will prove: x x y implies x y implies x y y y and x y and x y y x x x x y y y implies x y implies x y y x x y y x implies x x implies x y y

Then by transitivity, can obtain Then by transitivity, can obtain x x y y y implies x y implies x y y x x x x y y x implies x x implies x y y y y


Techniques”

Connecting Lemma Connecting Lemma ProofsProofs

Proof of x Proof of x y implies x y implies x y y y y x x y implies y is an upper bound of {x,y}. y implies y is an upper bound of {x,y}. Any upper bound z of {x,y} must satisfy y Any upper bound z of {x,y} must satisfy y z. z. So y is least upper bound of {x,y} and x So y is least upper bound of {x,y} and x y y y y

Proof of x Proof of x y implies x y implies x y y x x x x y implies x is a lower bound of {x,y}. y implies x is a lower bound of {x,y}. Any lower bound z of {x,y} must satisfy z Any lower bound z of {x,y} must satisfy z x. x. So x is greatest lower bound of {x,y} and x So x is greatest lower bound of {x,y} and x y y

x x


Techniques”

Connecting Lemma Connecting Lemma ProofsProofs

Proof of x Proof of x y y y implies x y implies x y y y is an upper bound of {x,y} implies x y is an upper bound of {x,y} implies x

yy Proof of x Proof of x y y x implies x x implies x y y

x is a lower bound of {x,y} implies x x is a lower bound of {x,y} implies x y y


Techniques”

Lattices as Algebraic Lattices as Algebraic StructuresStructures

Have defined Have defined and and in terms of in terms of Will now define Will now define in terms of in terms of and and

Start with Start with and and as arbitrary algebraic as arbitrary algebraic operations that satisfy associative, operations that satisfy associative, commutative, idempotence, and absorption commutative, idempotence, and absorption lawslaws

Will define Will define using using and and Will show that Will show that is a partial order is a partial order

Intuitive concept of Intuitive concept of and and as information as information combination operators (or, and)combination operators (or, and)


Techniques”

Lattice as AlgebraLattice as Algebra定理定理设设 <<SS, , ∗, , ∗ ∘∘>> 是具有两个二元运算和∗是具有两个二元运算和∗ 的代数系统的代数系统 , , 若对于和∗若对于和∗ 满足交换律、结合满足交换律、结合律、吸收律律、吸收律 , , 则可以适当定义则可以适当定义 SS 中的偏序≼中的偏序≼ , , 使得使得 <<SS, >≼, >≼ 构成格构成格 , , 且且 aa,,bb∈∈SS 有有 aa∧∧b b = = aa∗∗bb, , aa∨∨b b = = aa ∘∘ bb..定义定义设设 <<SS, ,∗, ,∗ ∘∘>> 是代数系统是代数系统 , ∗, ∗ 和和 ∘∘是二元是二元运算运算 , , 如果和∗如果和∗ ∘∘ 运算满足交换律、结合律运算满足交换律、结合律和吸收律和吸收律 , , 则则 <<SS, ,∗, ,∗ ∘∘>> 构成格构成格 ..


Techniques”

Algebraic Properties of Algebraic Properties of LatticesLattices

Assume arbitrary operations Assume arbitrary operations and and such that such that (x (x y) y) z z x x (y (y z) z) (associativity of (associativity of )) (x (x y) y) z z x x (y (y z) z) (associativity of (associativity of )) x x y y y y x x (commutativity of (commutativity of )) x x y y y y x x (commutativity of (commutativity of )) x x x x x x (idempotence of (idempotence of )) x x x x x x (idempotence of (idempotence of )) x x (x (x y) y) x x (absorption of (absorption of over over )) x x (x (x y) y) x x (absorption of (absorption of over over ))


Techniques”

Connection Between Connection Between and and

x x y y y if and only if x y if and only if x y y x x Proof of x Proof of x y y y implies x = x y implies x = x y y

x = x x = x (x (x y) y) (by absorption)(by absorption) = x = x y y (by assumption)(by assumption)

Proof of x Proof of x y y x implies y = x x implies y = x y yy = y y = y (y (y x) x) (by absorption)(by absorption) = y = y (x (x y) y) (by commutativity)(by commutativity) = y = y x x (by assumption)(by assumption) = x = x y y (by commutativity)(by commutativity)


Techniques”

Properties of Properties of Define x Define x y if x y if x y y y y Proof of transitive property. Must Proof of transitive property. Must

show thatshow that x x y y y and y y and y z z z implies x z implies x z z zz

x x z = x z = x (y (y z) z) (by assumption)(by assumption) = (x = (x y) y) z z (by associativity)(by associativity) = y = y z z (by assumption)(by assumption) = z= z (by assumption)(by assumption)


Techniques”

Properties of Properties of Proof of asymmetry property. Must show Proof of asymmetry property. Must show

thatthatx x y y y and y y and y x x x implies x x implies x y y

x = y x = y x x (by assumption)(by assumption) = x = x y y (by commutativity)(by commutativity) = y = y (by assumption)(by assumption)

Proof of reflexivity property. Must show Proof of reflexivity property. Must show thatthatx x x x x x x x x x x x (by idempotence)(by idempotence)


Techniques”

Monotonic Function & Fixed Monotonic Function & Fixed pointpoint

Let Let LL be a lattice. be a lattice. A function A function ff : : L L → → LL is is monotonic ifmonotonic if

∀∀x, yx, y ∈ S : ∈ S : xx yy ⇒ ⇒ f f ((xx) ) f f ((yy))

Let Let AA be a set, be a set, ff : : AA → → AA a function, a function, aa ∈ ∈AA ..If If f f ((aa) = ) = aa, then , then aa is called a is called a fixed pointfixed point of of f f on on AA


Techniques”

Existence of Fixed PointsExistence of Fixed Points• The The heightheight of a lattice is defined to of a lattice is defined to

be the length of the longest path be the length of the longest path from from ⊥⊥ to to ⊤⊤

• In a complete lattice In a complete lattice LL with finite with finite height, every monotonic function height, every monotonic function ff : : LL → → LL has a has a uniqueunique least fixed-pointleast fixed-point : :

fix fix ((f f ) = ) =

0( )i

if


Techniques”

Another Existence Another Existence TheoremTheorem

A function A function ff : : LL → → LL on a complete on a complete lattice lattice LL is called continuous if for is called continuous if for any subset any subset XX of of LL, , f f ((XX) = ) = f f ((XX). ).

Let Let LL be a complete lattice. If be a complete lattice. If ff : : LL → → LL is monotonic and continuous, then is monotonic and continuous, then ff has a has a uniqueunique least fixed-pointleast fixed-point

fix fix ((f f ) = ) =

0( )i

if


Techniques”

Knaster-Tarski Knaster-Tarski Fixed Point TheoremFixed Point Theorem

Suppose (L, Suppose (L, ) is a complete lattice, ) is a complete lattice, ff: L: LL is a monotonic function.L is a monotonic function.

Then the fixed point m of Then the fixed point m of ff can be can be defined asdefined as


Techniques”

Calculating Fixed PointCalculating Fixed Point The time complexity of computing a The time complexity of computing a

fixed-point depends on three factors:fixed-point depends on three factors: The height of the lattice, since this The height of the lattice, since this

provides a bound for i;provides a bound for i; The cost of computing f;The cost of computing f; The cost of testing equality.The cost of testing equality.

The computation of a fixed-point can be illustrated as a walk up the lattice starting at ⊥:


Techniques”

Fixed Point SolutionFixed Point SolutionLet L be a lattice with finite height. Let L be a lattice with finite height. An An equation systemequation system is of the form: is of the form:

xx11 = F = F11(x(x11, . . . , x, . . . , xnn))xx22 = F = F22(x(x11, . . . , x, . . . , xnn))......xxnn = F = Fnn(x(x11, . . . , x, . . . , xnn))where xwhere xii are variables and F are variables and Fii : L : Lnn → L is a → L is a

collection of monotone functions.collection of monotone functions.The solution can be obtained using the The solution can be obtained using the least least

fixed-pointfixed-point of the function F : L of the function F : Lnn → L → Lnn defined by:defined by:

F(xF(x11,...,x,...,xnn) = (F) = (F11(x(x11,...,x,...,xnn), ..., F), ..., Fnn(x(x11,...,x,...,xnn))))


Techniques”

AlgorithmAlgorithm The naive algorithm is then to proceed as follows:

x = (⊥, . . . ,⊥);do { t = x; x = F(x); }while (x ≠ t);

to compute the fixed-point x.

A better algorithm:x1 = ⊥; . . . xn = ⊥;do {

t1 = x1; . . . tn = xn;x1 = F1(x1, . . . , xn);. . .xn = Fn(x1, . . . , xn);

} while (x1 ≠ t1 ∨ . . . ∨ xn ≠ tn) to compute the fixed-point (x1, . . . , xn).

Why Better?


Techniques”

ChainsChains A set S is a A set S is a chainchain if if x,yx,yS. y S. y x or x x or x

y y P has no infinite chains if every chain in P has no infinite chains if every chain in

P is finiteP is finite P satisfies the P satisfies the ascending chainascending chain

conditioncondition if for all sequences x if for all sequences x11 x x22 ……there exists n, such that xthere exists n, such that xnn = x = xn+1n+1 = = ……


Techniques”

Application to Dataflow Application to Dataflow AnalysisAnalysis

Dataflow information will be lattice Dataflow information will be lattice valuesvalues Transfer functions operate on lattice valuesTransfer functions operate on lattice values Solution algorithm will generate increasing Solution algorithm will generate increasing

sequence of values at each program pointsequence of values at each program point Ascending chain condition will ensure Ascending chain condition will ensure

terminationtermination Will use Will use to combine values at control- to combine values at control-

flow join pointsflow join points


Techniques”

Transfer FunctionsTransfer Functions Transfer function Transfer function ff: P: PP for each P for each

node in control flow graphnode in control flow graph ff models effect of the node on the models effect of the node on the

program informationprogram information


Techniques”

Transfer FunctionsTransfer FunctionsEach dataflow analysis problem has a Each dataflow analysis problem has a

set F of transfer functions f: Pset F of transfer functions f: PPP Identity function iIdentity function iFF F must be closed under composition: F must be closed under composition:

f,gf,gF. the function h = F. the function h = x.f(g(x)) x.f(g(x)) FF Each f Each f F must be monotone:F must be monotone:

x x y implies f(x) y implies f(x) f(y) f(y) Sometimes all fSometimes all fF are distributive: F are distributive:

f(x f(x y) = f(x) y) = f(x) f(y) f(y) Distributivity implies monotonicityDistributivity implies monotonicity


Techniques”

Distributivity Implies Distributivity Implies MonotonicityMonotonicity

Proof of distributivity implies Proof of distributivity implies monotonicitymonotonicity

Assume f(x Assume f(x y) = f(x) y) = f(x) f(y) f(y) Must show: x Must show: x y = y implies f(x) y = y implies f(x) f(y) f(y)

= f(y)= f(y)f(y) = f(x f(y) = f(x y) y) (by assumption)(by assumption) = f(x) = f(x) f(y) f(y) (by distributivity)(by distributivity)


Techniques”

Putting Pieces TogetherPutting Pieces Together Forward Dataflow Analysis Forward Dataflow Analysis

FrameworkFramework Simulates execution of program Simulates execution of program

forward with flow of controlforward with flow of control


Techniques”

Forward Dataflow Forward Dataflow AnalysisAnalysis

Simulates execution of program forward with Simulates execution of program forward with flow of controlflow of control

For each node n, haveFor each node n, have ininnn –– value at program point before n value at program point before n outoutnn –– value at program point after n value at program point after n ffnn –– transfer function for n (given in transfer function for n (given innn, computes , computes

outoutnn)) Require that solution satisfyRequire that solution satisfy

n. outn. outnn = f = fnn(in(innn)) n n n n00. in. inn n = = { out { outmm . m in pred(n) } . m in pred(n) } ininn0n0 = I = I

Where I summarizes information at start of programWhere I summarizes information at start of program


Techniques”

Dataflow EquationsDataflow Equations Compiler processes program to Compiler processes program to

obtain a set of dataflow equationsobtain a set of dataflow equationsoutoutnn := f := fnn(in(innn))

ininnn := := { out { outmm . m in pred(n) } . m in pred(n) } Conceptually separates analysis Conceptually separates analysis

problem from programproblem from program


Techniques”

Worklist Algorithm for Worklist Algorithm for Solving Forward Dataflow Solving Forward Dataflow

EquationsEquationsfor each n do outfor each n do outnn := f := fnn(())ininn0n0 := I; out := I; outn0 n0 := f:= fn0n0(I)(I)worklist := N - { nworklist := N - { n00 } }while worklist while worklist do do

remove a node n from worklistremove a node n from worklistininnn := := { out { outmm . m in pred(n) } . m in pred(n) }outoutnn := f := fnn(in(innn))if outif outnn changed then changed then worklist := worklist worklist := worklist succ(n) succ(n)


Techniques”

Correctness ArgumentCorrectness Argument Why result satisfies dataflow equationsWhy result satisfies dataflow equations Whenever process a node Whenever process a node nn, set , set

outoutnn := f := fnn(in(innn) ) Algorithm ensures that outAlgorithm ensures that outnn = f = fnn(in(innn) )

Whenever outWhenever outm m changes, put succ(m) on changes, put succ(m) on worklist. Consider any node n worklist. Consider any node n succ(m). It will succ(m). It will eventually come off worklist and algorithm will eventually come off worklist and algorithm will set set ininnn := := { out { outmm . m in pred(n) } . m in pred(n) } to ensure to ensure that that ininnn = = { out { outmm . m in pred(n) } . m in pred(n) }

So final solution will satisfy dataflow equations So final solution will satisfy dataflow equations


Techniques”

Termination ArgumentTermination Argument Why does algorithm terminate?Why does algorithm terminate? Sequence of values taken on by inSequence of values taken on by innn or or

outoutn n is a chain. If values stop increasing, is a chain. If values stop increasing, worklist empties and algorithm worklist empties and algorithm terminates.terminates.

If lattice has ascending chain property, If lattice has ascending chain property, algorithm terminatesalgorithm terminates Algorithm terminates for finite latticesAlgorithm terminates for finite lattices For lattices without ascending chain property, For lattices without ascending chain property,

use widening operatoruse widening operator


Techniques”

Reaching DefinitionsReaching Definitions P = powerset of set of all definitions in program P = powerset of set of all definitions in program

(all subsets of set of definitions in program)(all subsets of set of definitions in program) = = (order is (order is )) = = I = inI = inn0n0 = = F = all functions f of the form f(x) = a F = all functions f of the form f(x) = a (x-b) (x-b)

b is set of definitions that node killsb is set of definitions that node kills a is set of definitions that node generatesa is set of definitions that node generates

General pattern for many transfer functionsGeneral pattern for many transfer functions f(x) = GEN f(x) = GEN (x-KILL) (x-KILL)


Techniques”

Does Reaching Definitions Does Reaching Definitions Framework Satisfy Framework Satisfy

Properties?Properties? satisfies conditions for satisfies conditions for x x y and y y and y z implies x z implies x z (transitivity) z (transitivity) x x y and y y and y x implies y = x (asymmetry) x implies y = x (asymmetry) x x x x (idempotence) (idempotence)

F satisfies transfer function conditionsF satisfies transfer function conditions x.x. (x- (x- ) = ) = x.xx.xF F (identity)(identity) Will show f(x Will show f(x y) = f(x) y) = f(x) f(y) f(y) (distributivity)(distributivity)

f(x) f(x) f(y) = (a f(y) = (a (x (x –– b)) b)) (a (a (y (y –– b)) b)) = a = a (x (x –– b) b) (y (y –– b) = a b) = a ((x ((x y) y) ––

b)b) = f(x = f(x y) y)


Techniques”

Does Reaching Definitions Does Reaching Definitions Framework Satisfy Framework Satisfy

Properties?Properties? What about composition?What about composition?

Given fGiven f11(x) = a(x) = a11 (x-b (x-b11) and f) and f22(x) = a(x) = a22 (x-b (x-b22)) Must show fMust show f11(f(f22(x)) can be expressed as a (x)) can be expressed as a (x (x

- b)- b)ff11(f(f22(x)) = a(x)) = a11 ((a ((a22 (x-b (x-b22)) - b)) - b11)) = a= a11 ((a ((a22 - b - b11) ) ((x-b ((x-b22) - b) - b11)))) = (a= (a11 (a (a22 - b - b11)) )) ((x-b ((x-b22) - b) - b11)))) = (a= (a11 (a (a22 - b - b11)) )) (x-(b (x-(b22 b b11))))

Let a = (aLet a = (a11 (a (a22 - b - b11)) and b = b)) and b = b22 b b11 Then fThen f11(f(f22(x)) = a (x)) = a (x (x –– b) b)


Techniques”

Available ExpressionsAvailable Expressions P = powerset of set of all expressions in P = powerset of set of all expressions in

program (all subsets of set of expressions)program (all subsets of set of expressions) = = (order is (order is )) = P = P I = inI = inn0n0 = = F = all functions f of the form f(x) = a F = all functions f of the form f(x) = a

(x-b)(x-b) b is set of expressions that node killsb is set of expressions that node kills a is set of expressions that node generatesa is set of expressions that node generates

Another GEN/KILL analysisAnother GEN/KILL analysis


Techniques”

Concept of ConservatismConcept of Conservatism Reaching definitions use Reaching definitions use as join as join

Optimizations must take into account all Optimizations must take into account all definitions that reach along ANY pathdefinitions that reach along ANY path

Available expressions use Available expressions use as join as join Optimization requires expression to Optimization requires expression to

reach along ALL pathsreach along ALL paths Optimizations must conservatively Optimizations must conservatively

take all possible executions into take all possible executions into account. account.


Techniques”

Backward Dataflow Analysis

Simulates execution of program backward against the flow of control

For each node n, have inn – value at program point before n outn – value at program point after n fn – transfer function for n (given outn, computes

inn) Require that solution satisfies

n. inn = fn(outn) n Nfinal. outn = { inm . m in succ(n) } n Nfinal = outn = O Where O summarizes information at end of

program


Techniques”

Worklist Algorithm for Solving Backward Dataflow

Equationsfor each n do inn := fn()for each n Nfinal do outn := O; inn := fn(O)worklist := N - Nfinal while worklist do

remove a node n from worklistoutn := { inm . m in succ(n) }inn := fn(outn)if inn changed then worklist := worklist pred(n)


Techniques”

Live VariablesLive Variables P = powerset of set of all variables in P = powerset of set of all variables in

program (all subsets of set of variables in program (all subsets of set of variables in program)program)

= = (order is (order is )) = = O = O = F = all functions f of the form f(x) = a F = all functions f of the form f(x) = a

(x-b)(x-b) b is set of variables that node killsb is set of variables that node kills a is set of variables that node readsa is set of variables that node reads


Techniques”

Very Busy ExpressionsVery Busy Expressions An expression is An expression is very busyvery busy if it will if it will definitelydefinitely

be evaluated again before its value changes.be evaluated again before its value changes. Also called “anticipated” expressionsAlso called “anticipated” expressions

We need the same lattice and auxiliary We need the same lattice and auxiliary functions as for available expressions.functions as for available expressions.

For every CFG node v the variable [v] For every CFG node v the variable [v] denotes the set of expressions that at the denotes the set of expressions that at the program point before the node definitely are program point before the node definitely are busy.busy.


Techniques”

Formulate the AnalysisFormulate the Analysis P = powerset of set of all expressions in P = powerset of set of all expressions in

program (all subsets of set of expressions)program (all subsets of set of expressions) = = (order is (order is )) = P = P I = inI = inExitExit = = F = all functions f of the form f(x) = a F = all functions f of the form f(x) = a (x- (x-

b)b) b is set of expressions that node killsb is set of expressions that node kills a is set of expressions that node evaluatesa is set of expressions that node evaluates


Techniques”

ExampleExamplevar x,a,b;var x,a,b;x = input;x = input;a = x-1;a = x-1;b = x-2;b = x-2;while (x>0) {while (x>0) {

output output a*ba*b-x;-x;x = x-1;x = x-1;

}}output a*b;output a*b;

var x,a,b,atimesb;x = input;a = x-1;b = x-2;atimesb = a*b;while (x>0) {

output atimesb-x;

x = x-1;}

output atimesb;


Techniques”

Comparisons of Four Comparisons of Four AnalysesAnalyses

Reaching DefinitionsReaching Definitions Available ExpressionsAvailable Expressions Live VariablesLive Variables Very Busy ExpressionsVery Busy Expressions


Techniques”

Forwards vs. backwardsForwards vs. backwards A A forwardsforwards analysis is one that for each analysis is one that for each

program point computes information program point computes information about the about the pastpast behavior. behavior. Examples of this are available expressions and Examples of this are available expressions and

reaching definitions.reaching definitions. Calculation: Calculation: predecessorspredecessors of CFG nodes. of CFG nodes.

A A backwardsbackwards analysis is one that for each analysis is one that for each program point computes information program point computes information about the about the futurefuture behavior. behavior. Examples of this are liveness and very busy Examples of this are liveness and very busy

expressions.expressions. Calculation: Calculation: successorssuccessors of CFG nodes. of CFG nodes.


Techniques”

May vs. MustMay vs. Must A A maymay analysis is one that describes analysis is one that describes

information that may possibly be true and, information that may possibly be true and, thus, computes an thus, computes an upperupper approximation. approximation. Examples of this are liveness and reaching Examples of this are liveness and reaching

definitions.definitions. Calculation: Calculation: unionunion operator. operator.

A A mustmust analysis is one that describes analysis is one that describes information that must definitely be true information that must definitely be true and, thus, computes a and, thus, computes a lower lower approximation. approximation. Examples of this are available expressions and Examples of this are available expressions and

very busy expressions.very busy expressions. Calculation: Calculation: intersectionintersection operator. operator.


Techniques”

Classification of AnalysesClassification of Analyses

ForwardsForwards BackwardsBackwards

MayMay Reaching Reaching DefinitionsDefinitions Live VariablesLive Variables

MustMust Available Available ExpressionsExpressions

Very Busy Very Busy ExpressionsExpressions


Techniques”

Example: Initialization Example: Initialization Try define an analysis that ensures

that variables are initialized before they are read.

Lattice? Powerset of variables in the given

program Forwards or Backwards?

Forwards May or Must?

MustMust


Techniques”

Meaning of Dataflow Meaning of Dataflow ResultsResults

Concept of program state s for control-flow graphsConcept of program state s for control-flow graphs• Program point n where execution located Program point n where execution located

(n is node that will execute next) (n is node that will execute next)• Values of variables in programValues of variables in program

Each execution generates a trajectory of states:Each execution generates a trajectory of states: ss00;s;s11;;……;s;skk,where each s,where each sii STST ssi+1i+1 generated from s generated from si i by executing basic block to by executing basic block to

Update variable valuesUpdate variable values Obtain new program point nObtain new program point n


Techniques”

Relating States to Analysis Result

Meaning of analysis results is given by an abstraction function AF:STP

Correctness condition: require that for all states s

AF(s) inn where n is the next statement to

execute in state s


Techniques”

Sign Analysis ExampleSign Analysis Example Sign analysis - compute sign of each Sign analysis - compute sign of each

variable vvariable v Base Lattice: P = flat lattice on {-,0,+}Base Lattice: P = flat lattice on {-,0,+}

Actual lattice records a value for each Actual lattice records a value for each variablevariable

Example element: [aExample element: [a+, b+, b0, c0, c-]-]

- 0 +

TOP

BOT


Techniques”

Interpretation of Lattice Interpretation of Lattice ValuesValues

If value of v in lattice is:If value of v in lattice is: BOT: no information about sign of vBOT: no information about sign of v -: variable v is negative-: variable v is negative 0: variable v is 0 0: variable v is 0 +: variable v is positive+: variable v is positive TOP: v may be positive or negativeTOP: v may be positive or negative

What is abstraction function AF?What is abstraction function AF? AF([xAF([x11,,……,x,xnn]) = [sign(x]) = [sign(x11), ), ……, sign(x, sign(xnn)])] Where sign(x) = 0 if x = 0, + if x > 0, - if x < 0Where sign(x) = 0 if x = 0, + if x > 0, - if x < 0


Techniques”

Transfer FunctionsTransfer Functions If n of the form v = cIf n of the form v = c

ffnn(x) = x[v(x) = x[v+] if c is positive+] if c is positive ffnn(x) = x[v(x) = x[v0] if c is 00] if c is 0 ffnn(x) = x[v(x) = x[v-] if c is negative-] if c is negative

If n of the form vIf n of the form v11 = v = v22*v*v33 ffnn(x) = x[v(x) = x[v11x[vx[v22] ] x[v x[v33]]]]

I = TOP I = TOP (uninitialized variables may have any sign)(uninitialized variables may have any sign)


Techniques”

Operation Operation on Lattice on Lattice BOT - 0 + TOP

BOT BOT BOT 0 BOT BOT

- BOT + 0 - TOP

0 0 0 0 0 0

+ BOT - 0 + TOP

TOP BOT TOP 0 TOP TOP


Techniques”

Example

b = -1 b = 1

a = 1[a+][a+]

[a+, b+][a+, b-]

[a+, bTOP]c = a*b

[a+, bTOP,c TOP]


Techniques”

Imprecision In ExampleImprecision In Example

b = -1 b = 1

a = 1[a+][a+]

[a+, b+][a+, b-]

[a+, bTOP]c = a*b

Abstraction Imprecision:[a1] abstracted as [a+]

Control Flow Imprecision:[bTOP] summarizes results of all executions. In any execution state s, AF(s)[b]TOP


Techniques”

General Sources of General Sources of ImprecisionImprecision Abstraction ImprecisionAbstraction Imprecision

Concrete values (integers) abstracted as lattice Concrete values (integers) abstracted as lattice values (-,0, and +)values (-,0, and +)

Lattice values less precise than execution valuesLattice values less precise than execution values Abstraction function throws away informationAbstraction function throws away information

Control Flow ImprecisionControl Flow Imprecision One lattice value for all possible control flow pathsOne lattice value for all possible control flow paths Analysis result has a single lattice value to summarize Analysis result has a single lattice value to summarize

results of multiple concrete executionsresults of multiple concrete executions Join operation Join operation moves up in lattice to combine values moves up in lattice to combine values

from different execution pathsfrom different execution paths Typically if x Typically if x y, then x is more precise than y y, then x is more precise than y


Techniques”

Why Have ImprecisionWhy Have Imprecision Make analysis tractableMake analysis tractable Unbounded sets of values in Unbounded sets of values in

executionexecution Typically abstracted by finite set of Typically abstracted by finite set of

lattice valueslattice values Execution may visit unbounded set of Execution may visit unbounded set of

statesstates Abstracted by computing joins of Abstracted by computing joins of

different pathsdifferent paths


Techniques”

Abstraction FunctionAbstraction Function AF(s)[v] = sign of vAF(s)[v] = sign of v

AF(n,[aAF(n,[a5, b5, b0, c0, c-2]) = [a-2]) = [a+, b+, b0, c0, c-]-] Establishes meaning of the analysis Establishes meaning of the analysis

resultsresults If analysis says variable has a given signIf analysis says variable has a given sign Always has that sign in actual executionAlways has that sign in actual execution

Correctness condition: Correctness condition: v. AF(s)[v] v. AF(s)[v] in innn[v] (n is node for s)[v] (n is node for s) Reflects possibility of imprecisionReflects possibility of imprecision


Techniques”

Abstraction Function Abstraction Function SoundnessSoundness

Will showWill show v. AF(s)[v] v. AF(s)[v] in innn[v] (n [v] (n is node for s)is node for s)by induction on length of computation that by induction on length of computation that

produced sproduced s Base case:Base case:

v. inv. inn0n0[v] = TOP, which implies [v] = TOP, which implies thatthat

v. AF(s)[v] v. AF(s)[v] TOP TOP


Techniques”

Induction StepInduction Step Assume Assume v. AF(s)[v] v. AF(s)[v] in innn[v] for computations of length k[v] for computations of length k Prove for computations of length k+1Prove for computations of length k+1 Proof:Proof:

Given s (state), n (node to execute next), and inGiven s (state), n (node to execute next), and innn Find p (the node that just executed), sFind p (the node that just executed), spp(the previous state), (the previous state),

and inand inpp By induction hypothesis By induction hypothesis v. AF(s v. AF(spp)[v] )[v] in inpp[v][v] Case analysis on form of nCase analysis on form of n

If n of the form v = c, then If n of the form v = c, then s[v] = c and outs[v] = c and outp p [v] = sign(c), so[v] = sign(c), so AF(s)[v] AF(s)[v]

= sign(c) = out= sign(c) = outp p [v][v] in innn[v][v] If xIf xv, s[x] = sv, s[x] = sp p [x] and out[x] and outp p [x] = in[x] = inpp[x], so[x], so AF(s)AF(s)

[x] = AF(s[x] = AF(spp)[x] )[x] in inpp[x] = out[x] = outp p [x] [x] in innn[x][x] Similar reasoning if n of the form vSimilar reasoning if n of the form v11 = v = v22*v*v33


Techniques”

Augmented Execution Augmented Execution StatesStates

Abstraction functions for some Abstraction functions for some analyses require augmented analyses require augmented execution statesexecution states Reaching definitions: states are Reaching definitions: states are

augmented with definition that created augmented with definition that created each valueeach value

Available expressions: states are Available expressions: states are augmented with expression for each augmented with expression for each valuevalue


Techniques”

Meet Over Paths Meet Over Paths SolutionSolution

What solution would be ideal for a forward What solution would be ideal for a forward dataflow analysis problem? dataflow analysis problem?

Consider a path p = nConsider a path p = n00, n, n11, , ……, n, nkk, n to a node n , n to a node n (note that for all i n(note that for all i nii pred(n pred(ni+1i+1))))

The solution must take this path into account:The solution must take this path into account:ffpp ( () = (f) = (fnknk(f(fnk-1nk-1((……ffn1n1(f(fn0n0(()) )) ……)) )) in innn

So the solution must have the property that So the solution must have the property that {f{fpp ( () . p is a path to n} ) . p is a path to n} in innn

and ideallyand ideally

{f{fpp ( () . p is a path to n} = in) . p is a path to n} = innn


Techniques”

Soundness Proof of Soundness Proof of Analysis AlgorithmAnalysis Algorithm

Property to prove:Property to prove:For all paths p to n, fFor all paths p to n, fpp ( () ) in innn

Proof is by induction on length of pProof is by induction on length of p Uses monotonicity of transfer functionsUses monotonicity of transfer functions Uses following lemmaUses following lemma

Lemma:Lemma:Worklist algorithm produces a solution such thatWorklist algorithm produces a solution such that

ffnn(in(innn) = out) = outnn

if n if n pred(m) then out pred(m) then outnn in inmm


Techniques”

ProofProof Base case: p is of length 1Base case: p is of length 1

Then p = nThen p = n0 0 and fand fpp(() = ) = = in = inn0n0

Induction step:Induction step: Assume theorem for all paths of length kAssume theorem for all paths of length k Show for an arbitrary path p of length Show for an arbitrary path p of length

k+1k+1


Techniques”

Induction Step ProofInduction Step Proof p = np = n00, , ……, n, nkk, n, n Must show fMust show fkk(f(fk-1k-1((……ffn1n1(f(fn0n0(()) )) ……)) )) in innn

By induction (fBy induction (fk-1k-1((……ffn1n1(f(fn0n0(()) )) ……)) )) in innknk Apply fApply fk k to both sides, by monotonicity we to both sides, by monotonicity we

getget ffkk(f(fk-1k-1((……ffn1n1(f(fn0n0(()) )) ……)) )) f fkk(in(innknk) ) By lemma, fBy lemma, fkk(in(innknk) = out) = outnknk By lemma, outBy lemma, outnk nk in innn By transitivity, fBy transitivity, fkk(f(fk-1k-1((……ffn1n1(f(fn0n0(()) )) ……)) )) in innn


Techniques”

DistributivityDistributivity Distributivity preserves precisionDistributivity preserves precision If framework is distributive, then If framework is distributive, then

worklist algorithm produces the meet worklist algorithm produces the meet over paths solutionover paths solution For all n:For all n:

{f{fpp ( () . p is a path to n} = in) . p is a path to n} = innn


Techniques”

Lack of Distributivity Lack of Distributivity ExampleExample

Constant CalculatorConstant Calculator Flat Lattice on IntegersFlat Lattice on Integers

Actual lattice records a value for each Actual lattice records a value for each variablevariable Example element: [aExample element: [a3, b3, b2, c2, c5]5]

-1 10

TOP

BOT

-2 2 ……


Techniques”

Transfer FunctionsTransfer Functions If n of the form v = cIf n of the form v = c

ffnn(x) = x[v(x) = x[vc]c] If n of the form vIf n of the form v11 = v = v22+v+v33

ffnn(x) = x[v(x) = x[v11x[vx[v22] + x[v] + x[v33]]]] Lack of distributivityLack of distributivity

Consider transfer function f for c = a + bConsider transfer function f for c = a + b f([af([a3, b3, b2]) 2]) f([a f([a2, b2, b3]) = [a3]) = [aTOP, bTOP, bTOP, cTOP, c5]5] f([af([a3, b3, b2]2][a[a2, b2, b3]) = f([a3]) = f([aTOP, bTOP, bTOP]) = TOP]) =

[a[aTOP, bTOP, bTOP, cTOP, cTOP]TOP]


Techniques”

Lack of Distributivity Lack of Distributivity AnomalyAnomaly

a = 2b = 3

a = 3b = 2

[a3, b2][a2, b3]

[aTOP, bTOP]c = a+b

[aTOP, bTOP, c TOP]

Lack of Distributivity Imprecision: [aTOP, bTOP, c5] more precise

What is the meet over all paths solution?


Techniques”

How to Make Analysis How to Make Analysis DistributiveDistributive

Keep combinations of values on different Keep combinations of values on different pathspaths

a = 2b = 3

a = 3b = 2

{[a3, b2]}{[a2, b3]}

{[a2, b3], [a3, b2]} c = a+b

{[a2, b3,c5], [a3, b2,c5]}


Techniques”

IssuesIssues Basically simulating all combinations of Basically simulating all combinations of

values in all executionsvalues in all executions Exponential blowupExponential blowup Nontermination because of infinite Nontermination because of infinite

ascending chainsascending chains Nontermination solutionNontermination solution

Use widening operator to eliminate blowup Use widening operator to eliminate blowup (can make it work at granularity of (can make it work at granularity of variables)variables)

Loses precision in many casesLoses precision in many cases


Techniques”

SummarySummary Formal dataflow analysis frameworkFormal dataflow analysis framework

Lattices, partial ordersLattices, partial orders Transfer functions, joins and splitsTransfer functions, joins and splits Dataflow equations and fixed point solutionsDataflow equations and fixed point solutions

Connection with programConnection with program Abstraction function AF: S Abstraction function AF: S P P For any state s and program point n, AF(s) For any state s and program point n, AF(s)

ininnn Meet over all paths solutions, distributivityMeet over all paths solutions, distributivity


Techniques”

Next TimeNext Time Partial Redundancy EliminationPartial Redundancy Elimination

DragonBook DragonBook §9.59.5

data flow analysis foundations

Documents