dependence: theory and practice allen and kennedy, chapter 2 liza fireman

60
Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Post on 22-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Dependence: Theory and Practice

Allen and Kennedy, Chapter 2

Liza Fireman

Page 2: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Optimizations

On scalar machines, the principal optimizations are register allocation, instruction scheduling, and reducing the cost of array address calculations.

The optimization process in parallel computers includes finding parallelism in a sequential code.

Parallel tasks perform similar operations on different elements of the data arrays.

Page 3: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Fortran Loops

DO I=1,N A(I)=B(I)+C

ENDDO

A(1:N)=B(1:N)+C

Page 4: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Fortran Loops

DO I=1,N A(I+1)=A(I)+B(I)

ENDDO

A(2:N+1)=A(1:N)+B(1:N)

Page 5: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Bernstein’s conditions

When is it safe to run two tasks R1 and R2 in parallel?If none of the following holds:1.R1 writes into a memory location that R2

reads2.R2 writes into a memory location that R1

reads3.Both R1 and R2 write to the same memory

location

Page 6: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Assumption

The compiler must have the ability to determine whether two different iterations access the same memory location.

Page 7: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Data dependence

S1 PI = 3.14S2 R = 5.0S3 AREA = PI * R ** 2

Data dependences from S1 and S2 to S3.

No execution constraint between S1 and S2.

S1 S2 S3

S2 S1 S3

S2

S3

S1

Page 8: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Control dependence

S1 IF (T != 0.0) Goto S3 S2 A = A / TS3 CONTINUE

S2 can not be executed before S1.

Page 9: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Data dependence

We can not interchange loads and stores to the same location.

There is a Data dependence between S1 and S2 iff

(1) S1 and S2 access the same memory location and at least one of the stores into it(2) there is a feasible run-time execution path from S1 to S2.

Page 10: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

True dependence (RAW(

Antidependence (WAR)

S1 X… = S2 … = X

S1 … = XS2 X… =

S1S2

S1-1S2

S1 S2

S1 S2-1

Page 11: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Output dependence (WAW)

S1 X… = S2 X… =

S1 X = 3…S3 X = 5S4 W = X * Y

S1 S2o

S1oS2

Page 12: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

DO I=1,NS A(I+1)=A(I)+B(I)ENDDO

DO I=1,NS A(I+2)=A(I)+B(I)ENDDO

S

Page 13: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Iteration number

When I=L the iteration number is 1. When I=L+S the iteration number is 2. When I=L+J*S the iteration number is J+1.

DO I=L,U,S …

ENDDO

Page 14: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Nesting level

• In a loop nest, the nesting level of a specific loop is equal to the number of loop that enclose it. DO I=1,N

DO J=1,MDO K=1,L

S A(I+1,J,K)=A(I,J,K)+10 ENDDO

ENDDOENDDO

Page 15: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Iteration Vector

• Given a nest of n loops, the iteration vector of particular iteration is a vector i={i1,i2,…in} where ik 1≤k≤n represents the iteration number for the loop at nesting level k.

DO I=1,2 DO J=1,3

S… ENDDO

ENDDO S[(2,1)]

){1,1),(1,2,()1,3),(2,1,()2,2),(2,3}(

Page 16: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Iteration Vector – Lexicographic orderIteration i precedes iteration j,

denoted i < j, iff (1) i[1:n-1] < j[1:n-1] Or(2) i[1:n-1]=j[1:n-1] and in < jn

Page 17: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop dependence

There exists a dependence from statement S1 to S2 in a common nest of loops iff there exist two iteration vectors i and j such that(1) i < j or i = j and there is a path from S1 to S2 in the body of the loop

(2) S1 accesses memory location M on iteration i and S2 accesses M on iteration j

(3) one of these accesses is a write

Page 18: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Transformations

A transformation is “safe” if the transformed program has the same “meaning”.

The transformation should have no effect of the outputs.

A reordering transformation is any program transformation that merely changes the order of execution, without adding or deleting any executions of any statements.

Page 19: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Transformations – cont.

A reordering transformation does not eliminate dependences

It can change the ordering of the dependence but it will lead to incorrect behavior

A reordering transformation preserves a dependence if it preserves the relative execution order of the source and sink of that dependence.

Page 20: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Fundamental Theorem of Dependence Any reordering transformation that

preserves every dependence in a program preserves the meaning of that program

A transformation is said to be valid for the program to which it applies if it preserves all dependences in the program.

Page 21: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Example

L0 DO I=1,NL1 DO J=1,2S0 A(I,J)=A(I,J)+B

ENDDOS1 T=A(I,1)S2 A(I,1)=A(I,2)S3 A(I,2)=T

ENDDO

Page 22: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Distance Vector Consider a dependence in a loop nest of n

loopso Statement S1 on iteration i is the source of the

dependenceo Statement S2 on iteration j is the sink of the

dependence

The distance vector is a vector of length n d(i,j) such that: d(i,j)k = jk - ik

Note that the iteration vectors are normalized.

Page 23: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Example

DO J=1,10 DO I=1,99

S1 A(I,J)=B(I,J)+XS2 C(I,J)=A(100-I,J)+Y

ENDDOENDDOThere are total of 50 different distances

for true dependences from S1 to S2

And 49 distances for anti-dependences from S2 to S1

Page 24: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Direction Vector

Suppose that there is a dependence from statement S1 on iteration i of a loop nest of n loops and statement S2 on iteration j, then the dependence direction vector is D(i,j) is defined as a vector of length n such that

”<“ if d(i,j)k > 0 ”=“ if d(i,j)k = 0

”>“ if d(i,j)k < 0

D(i,j)k=

Page 25: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Distance and Direction Vectors - Example

DO I=1,N DO J=1,M

DO K=1,LS A(I+1,J,K-1)=A(I,J,K)+10

ENDDO ENDDO

ENDDO

S has a true dependence on itself

Distance Vector: (1, 0, -1)

Direction Vector)> ,= ,<( :

S

Page 26: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Distance and Direction Vectors – cont. A dependence cannot exist if it has a

direction vector whose leftmost non "=" component is not "<" as this would imply that the sink of the dependence occurs before the source.

Page 27: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Direction Vector Transformation Let T be a transformation that is applied to

a loop nest and that does not rearrange the statements in the body of the loop. Then the transformation is valid if, after it is applied, none of the direction vectors for dependences with source and sink in the nest has a leftmost non- “=” component that is “>”.

Follows from Fundamental Theorem of Dependence:o All dependences existo None of the dependences have been reversed

Page 28: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop-carried Dependence

Statement S2 has a loop-carried dependence on statement S1 if and only if S1 references location M on iteration i, S2 references M on iteration j and d(i,j) > 0 (that is, D(i,j) contains a “<” as leftmost non “=” component).

In other words, when S1 and S2 execute on different iterations.

DO I=1,NS1 A(I+1)=F(I)S2 F(I+1)=A(I)ENDDO

Page 29: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Level of Loop-carried Dependence Level of a loop-carried dependence

is the index of the leftmost non-“=” of D(i,j) for the dependence.

A level-k dependence between S1 and S2 is denoted by S1 k S2

DO I=1,N DO J=1,M

DO K=1,LS A(I,J,K+1)=A(I,J,K)+10

ENDDO ENDDO

ENDDO

Direction vector for S1 is (=, =, <)

Level of the dependence is 3

S

3

Page 30: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop-carried Transformations Any reordering transformation that does not

alter the relative order of any loops in the nest and preserves the iteration order of the level-k loop preserves all level-k dependences.

Proof:– D(i, j) has a “<” in the kth position and “=” in

positions 1 through k-1 Source and sink of dependence are in the same

iteration of loops 1 through k-1 Cannot change the sense of the dependence by

a reordering of iterations of those loops

Page 31: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop-carried Transformations – cont.DO I=1,NS1 A(I+1)=F(I)S2 F(I+1)=A(I)ENDDO

DO I=1,NS1 F(I+1)=A(I)S2 A(I+1)=F(I)ENDDO

Page 32: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

DO I=1,N DO J=1,M

DO K=1,LS A(I,J,K+1)=A(I,J,K)+10

ENDDO ENDDO

ENDDO DO I=1,N DO J=M,1

DO K=1,LS A(I,J,K+1)=A(I,J,K)+10

ENDDO ENDDO

ENDDO

Page 33: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop-independent Dependence Statement S2 has a loop-independent

dependence on statement S1 if and only if there exist two iteration vectors i and j such that:

1 )Statement S1 refers to memory location M on iteration i, S2 refers to M on iteration j, and i = j.

2 )There is a control flow path from S1 to S2 within the iteration.

In other words, when S1 and S2 execute on the same iteration.

Page 34: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop-independent Dependence - Example S2 depends on S1 with a loop

independent dependence is denoted by S1 S2

Note that the direction vector will have entries that are all “=” for loop independent dependences

DO I=1,NS1 A(I)...=

S2 ...=A(I)ENDDO

S1 S2∞

Page 35: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop-independent Dependence - Example

DO I=1,9S1 A(I)...=

S2 ...=A(10-I)ENDDO

Page 36: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop-independent Dependence - Example

DO I=1,10S1 A(I)...=

ENDDODO I=1,10S2 ...=A(20-I)ENDDO

Page 37: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop-independent Dependence Transformation If there is a loop-independent dependence

from S1 to S2, any reordering transformation that does not move statement instances between iterations and preserves the relative order of S1 and S2 in the loop body preserves that dependence.

Follows from Fundamental Theorem of Dependence:o All dependences existo None of the dependences have been reversed

Page 38: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Simple Dependence Testing

DO I1=L1,U1,S1

DO I2=L2,U2,S2

... DO In=Ln,Un,Sn

S1 A(f1(i1,…,in),…, fm(i1,in))...=S2 ...=A(g1(i1,…in),…,gm(i1,…in))

ENDDO... ENDDO

Page 39: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop-independent Dependence Transformation• A dependence exists from S1 to S2 if

and only if there exist values of a and b such that (1) a is lexicographically less than or equal to b and (2) the following system of dependence equations is satisfied:

fi(a) = gi(b) for all i, 1 i m

Page 40: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Simple Dependence Testing: Delta Notation Notation represents index values at the

source and sink Iteration at source denoted by: I0 Iteration at sink denoted by: I0 + I

Forming an equality gets us: I0 + 1 = I0 + I Solving this gives us: I = 1

Carried dependence with distance vector (1) and direction vector (<)

DO I=1,NS A(I+1)=A(I)+BENDDO

Page 41: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Simple Dependence Testing: Delta Notation

DO I=1,100 DO J=1,100

DO K=1,100 S A(I+1,J,K)=A(I,J,K+1)+10

ENDDO ENDDO

ENDDO

I0 + 1 = I0 + I; J0 = J0 + J; K0 = K0 + K + 1

I=1;J=0;K=-1

Corresponding direction vector: (<, =, >)

Page 42: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Simple Dependence Testing: Delta Notation - cont.

• If a loop index does not appear, its distance is unconstrained and its direction is “*”

DO I=1,100 DO J=1,100

S A(I+1)=A(I)+B(J) ENDDO

ENDDOCorresponding direction vector: (<,*)

Page 43: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Simple Dependence Testing: Delta Notation – cont.

DO J=1,100 DO I=1,100

S A(I+1)=A(I)+B(J) ENDDO

ENDDO

Corresponding direction vector: (*,<)

(*,<) = { (<, <), (=, <), (>, <) }

(>, <) denotes a level 1 antidependence

with direction vector (<, >)

Page 44: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Parallelization and Vectorization• It is valid to convert a sequential

loop to a parallel loop if the loop carries no dependence.

DO I=1,N X(I)=X(I)+C

ENDDO

X(1:N)=X(1:N)+C ✔

Page 45: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Parallelization and Vectorization

DO I=1,N X(I+1)=X(I)+C

ENDDO

X(2:N+1)=X(1:N)+C ✘

Page 46: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop Distribution

Can statements in loops which carry dependences be vectorized?DO I=1,NS1 A(I+1)=B(I)+C

S2 D(I)=A(I)+EENDDO

S1S2

DO I=1,NS1 A(I+1)=B(I)+CENDDODO I=1,N

S2 D(I)=A(I)+EENDDO

A(2:N+1)=B(1:N)+CD(1:N)=A(1:N)+E

Page 47: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Loop Distribution – cont.

Loop distribution fails if there is a cycle of dependencesDO I=1,NS1 A(I+1)=B(I)+C

S2 B(I+1)=A(I)+EENDDO

S11S2 and S21S1

S1 S2

Page 48: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

ExampleDO I=1,100S1 X(I)=Y(I)+10

DO J=1,100S2 B(J)=A(J,N)

DO K=1,100 S3 A(J+1,K)=B(J)+C(J,K)

ENDDOS4 Y(I+J)=A(J+1,N)

ENDDOENDDO

Page 49: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

DO I=1,100S1 X(I)=Y(I)+10

DO J=1,100S2 B(J)=A(J,N)

DO K=1,100 S3 A(J,K+1)=A(J+1,N)+10

ENDDOS4 Y(I+J)=A(J+1,N)

ENDDOENDDO

Advanced Vectorization Algorithm – cont.

S1S2

S3S4

1

10

101

0

2, 1,1-1

1,1-1

1-1

1

Page 50: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Simple Vectorization Algorithm

procedure vectorize (L, D)// L is the maximal loop nest // D is the dependence graph for statements in L find the set {S1, S2, ... , Sm} of maximal strongly-connected components in D reducing each Si to a single node compute the dependence graph naturally induced use topological sort

for i = 1 to m do begin if Si is a dependence cycle then

generate a DO-loop around the statements in Si; else

directly rewrite Si vectorized with respect to every loop containing it;

endend vectorize

Page 51: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

DO I=1,100S1 X(I)=Y(I)+10

DO J=1,100S2 B(J)=A(J,N)

DO K=1,100 S3 A(J,K+1)=B(J)+C(J,K)

ENDDOS4 Y(I+J)=A(J+1,N)

ENDDOENDDO

Simple Vectorization Algorithm – cont.

10

10

10

S1S2

S3S4

1

2, 1,1-1

1,1-1

1-1

1

Page 52: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

DO I=1,100 DO J=1,100

S2 B(J)=A(J,N)DO K=1,100

S3 A(J,K+1)=B(J)+C(J,K) ENDDO

S4 Y(I+J)=A(J+1,N) ENDDO

ENDDOX(1:100)=Y(1:100)+10

Simple Vectorization Algorithm – cont.

10

10

10

S1S2

S3S4

1

2, 1,1-1

1,1-1

1-1

1

Page 53: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Problems With Simple Vectorization

DO I=1,N DO J=1,M

S A(I+1,J)=A(I,J)+B ENDDO

ENDDO

Dependence from S to itself with d(i, j) = (1,0)

DO I=1,NS A(I+1,1:M)=A(I,1:M)+B

ENDDO The simple algorithm does

not capitalize on such opportunities

S

1

Page 54: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Advanced Vectorization Algorithm

procedure codegen(R, k, D); //R is the region for which we must generate code.

//k is the minimum nesting level of possible parallel loops . //D is the dependence graph among statements in R ..

find the set {S1, S2, ... , Sm} of maximal strongly-connected components inD restricted to R;

reduce each Si to a single node and compute the dependence graph induced use topological sort for i = 1 to m do beginif Si is cyclic then begingenerate a level-k DO statement;let Di be the dependence graph Si eliminating all dependence edges that are at level k codegen (Ri, k+1, Di);generate the level-k ENDDO statement;endelsegenerate a vector statement for Si in r(Si)-k+1 dimensions ,where r(Si) is the number of loops containing Si;end

Page 55: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Advanced Vectorization Algorithm – cont.DO I=1,100S1 X(I)=Y(I)+10

DO J=1,100S2 B(J)=A(J,N)

DO K=1,100 S3 A(J,K+1)=B(J)+C(J,K)

ENDDOS4 Y(I+J)=A(J+1,N)

ENDDOENDDO

Page 56: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

DO I=1,100S1 X(I)=Y(I)+10

DO J=1,100S2 B(J)=A(J,N)

DO K=1,100 S3 A(J,K+1)=A(J+1,N)+10

ENDDOS4 Y(I+J)=A(J+1,N)

ENDDOENDDO

Advanced Vectorization Algorithm – cont.

S1S2

S3S4

1

10

101

0

2, 1,1-1

1,1-1

1-1

1

Page 57: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

DO I=1,100 codegen(}S2,S3,S4{,2)

ENDDOX(1:100)=Y(1:100)+10

Advanced Vectorization Algorithm – cont.

10

10

10

S1S2

S3S4

1

2, 1,1-1

1,1-1

1-1

1

Page 58: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

DO I=1,100 codegen(}S2,S3,S4{,2)

ENDDOX(1:100)=Y(1:100)+10

Advanced Vectorization Algorithm – cont.

S2

S3S4

2

Page 59: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

DO I=1,100 DO J=1,100

codegen(}S2,S3{,3) ENDDO

Y(I+1:I+100)=A(2:101,N)ENDDOX(1:100)=Y(1:100)+10

Advanced Vectorization Algorithm – cont.

S2

S3S4

2

Page 60: Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman

Advanced Vectorization Algorithm – cont.

DO I=1,100 DO J=1,100 B(J) = A(J,N) A(J+1,1:100)=B(J)+C(J,1:100) ENDDO Y(I+1:I+100)=A(2:101,N)ENDDOX(1:100)=Y(1:100)+10

S2

S3