reducing number of operations: the joy of algebraic transformations cs498dhp program optimization

43
Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Upload: rosa-hicks

Post on 17-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Scheduling Consider the expression tree: It can be shortened by applying –Associativity and commutativity: [a+h+b*(c+g+d*e*f) ] or –Associativity, commutativity and distributivity: [a+h+b*c+b*g+b*d*e*f]. The second expression is the sortest of the three. This means that with enough resources the third expression is the fastest although is has the most operations. + h + a * b + + c * f * e g d

TRANSCRIPT

Page 1: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Reducing number of operations:The joy of algebraic transformations

CS498DHP Program Optimization

Page 2: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Number of operations and execution time

• Fewer number of operations does not necessarily mean shorter execution times.– Because of scheduling in a parallel environment.– Because of locality.– Because of communication in a parallel program.

• Nevertheless, although it has to be applied carefully, reducing the number of operations is one of the important optimizations.

• In this presentation, we discuss transformation to reduce the number of operations or reduce the length of scheduling in an idealized parallel environment where communication costs are zero.

Page 3: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Scheduling• Consider the expression tree:• It can be shortened by applying

– Associativity and commutativity: [a+h+b*(c+g+d*e*f) ] or

– Associativity, commutativity and distributivity: [a+h+b*c+b*g+b*d*e*f].

• The second expression is the sortest of the three. This means that with enough resources the third expression is the fastest although is has the most operations.

+

h+

a *

b +

+

c *

f*

e

g

d

Page 4: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Locality• Consider:

do i=1.nc(i) = a(i)+b(i)+a(i)/b(i)

end do…do i=1,n

x(i) = (a(i)+b(i))*t(i)+a(i)/b(i)end do

do i=1,n

d(i) = a(i)/b(i)c(i) = a(i)+b(i)+d(i)

end do…do i=1,n

x(i) = (a(i)+b(i))*t(i)+d(i)end do

• The sequence on the right executes fewer operations, but, if n is large enough, it also incurs in more cache misses. (We assume that t is computed between the two loops so that they cannot be fused.)

Page 5: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Communication in parallel programs• Consider:

cobegin…do i=1,n

a(i) = ..end do

send a(1:n)…

// … receive a(1:n) …coend

cobegin

…do i=1,n

a(i) = ..end do…

// … do i=1,n

a(i) = ..end do

…coend

• The sequence on the right executes more operation s, but it would execute faster if the send operation is expensive.

Page 6: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Approaches to reducing cost of computation

• Eliminate (syntactically) redundant computations.

• Apply algebraic transformations to reduce the number of operations.

• Decompose sequential computations for parallel execution.

• Apply algebraic transformations to reduce the height of expressions trees and thus reduce execution time in a parallel environment.

Page 7: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Elimination of redundant computations

• Many of the transformations were discussed in the context of compiler transformations.– Common subexpression elimination– Loop invariant removal– Elimination of redundant counters– Loop unrolling (not discussed, but should

have). It eliminates bookkeeping operations.

Page 8: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

• However, compilers will not eliminate all redundant computations. Here is an example where user intervention is needed:The following sequence

do i=1,ns = a(i)+s

end do…do i=1,n-1

t = a(i)+tend do…t…

Page 9: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

May be replaced bydo i=1,n-1

t = a(i)+tend dos=t+a(n)……t…

This transformation is not usually done by compilers.

Page 10: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

2. Another example, from C, is the loop for (i = 0; i < n; i++) { for (j = 0; j < n; j++) { a[i,j]=0; } }

Which, if a is n × n, can be transformed into the loop below that has fewer bookkeeping operations.

b=a;for (i = 0; i < n*n; i++) { *b=0; b++; }

Page 11: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Applying algebraic transformations to reduce the number of operations• For example, the expressions a*(b*c)+

(b*a)*d+a*e can be transformed into (a*b)*(c+d)+a*e by distributivity and then by associativity and distributivity into a*(b*(c+d)+e).

• Notice that associativity has to be applied with care. For example, suppose we are operating on floating point values and that x is very much larger than y and z=-x. Then (y+x)+z may give 0 as a result, while y+(x+z) gives y as an answer.

Page 12: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

• The application of algebraic rules can be very sophisticated. Consider the computation of xn. A naïve implementation would require n-1 multiplications.

• However, if we represent n in binary as n=b0+2(b1+2(b2 + …)) and notice that xn=xb0 (xb1+2(b2 + …))2, the number of multiplications can be reduced to O(log n).

Page 13: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

function power(x,n) (assume n>0)if n==1 then return xif n%2==1 then return x*power(x,n-1)

else x=power(x,n/2); return x*x

Page 14: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Horner’s rule• A polynomial

A(x) = a0 + a1x + a2x² + a3x³ + ... may be written as A(x) = a0 + x(a1 + x(a2 + x(a3 + ...))).

As a result, a polynomial may be evaluated at a point x', that is A(x') computed, in Θ(n) time using Horner's rule. That is, repeated multiplications and additions, rather than the naive methods of raising x to powers, multiplying by the coefficient, and accumulating.

Page 15: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Conventional matrix multiplication

Asymptotic complexity: 2n3 operationsEach recursion step (blocked version): 8 multiplications, 4 additions

                                                                        

                           

Page 16: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Strassen’s AlgorithmAsymptotic complexity: O(nlog

27) = O(n2.8…) operations

Each recursion step: 7 multiplications, 18 additions/subtractions

                                                                        

                                      

                              

Asymptotic complexity is solution of T(n)=7T(n/2)+18(n/2)2

Page 17: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

WinogradAsymptotic complexity: O(n2.8..)operationsEach recursion step: 7 multiplications, 15 additions/subtractions

                                                                        

                                                                                        

Page 18: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Parallel matrix multiplication

• Parallel matrix multiplication can be accomplished without redundant operations.

• First observe that the time to compute a sum of n elements, given enough resources, is

. n2log

Page 19: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Time:

Page 20: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Time:

Page 21: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

• With sufficient replication and computational resources matrix multiplication can take just one multiplication step and additions n2log

Page 22: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Copying can also be done in logarithmic steps

Page 23: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Parallelism and redundancy

• Algebra rules can be applied to reduce tree height.

• In some cases, the height of the tree is reduced at the expense of an increase in the number of operations

Page 24: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 25: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 26: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 27: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 28: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 29: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Parallel Prefix

Page 30: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 31: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 32: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 33: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 34: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 35: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 36: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 37: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization
Page 38: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Redundancy in parallel sorting.Sorting networks.

Page 39: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Comparator (2-sorter)

x

y

min(x, y)

max(x, y)

inputs outputs

Page 40: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Comparison Network

0

0

1

1

0

0

1

1

0

0

1

1

1

1

0

0

n / 2comparisonsper stage

d stages

Page 41: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Sorting Networks

SortingNetwork

10010011

00001111

inputs outputs

sorted

Page 42: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

Insertion Sort Network

inputs outputs

depth 2n 3

Page 43: Reducing number of operations: The joy of algebraic transformations CS498DHP Program Optimization

   comparator stages

   comparators

Odd-even transposition sort   

O(n)   

O(n2)

Bubblesort   

O(n)   

O(n2)

Bitonic sort   

O(log(n)2)   

O(n·log(n)2)

Odd-even mergesort   

O(log(n)2)   

O(n·log(n)2)

Shellsort   

O(log(n)2)   

O(n·log(n)2)