analyzing parallel performance intel software college introduction to parallel programming – part...

28
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Upload: ethan-keating

Post on 26-Mar-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Analyzing Parallel Performance

Intel Software College

Introduction to Parallel Programming – Part 6

Page 2: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

2Analyzing Parallel Performance

Intel® Software College

Objectives

At the end of this module, you should be able to

Define speedup and efficiency

Use Amdahl’s Law to predict maximum speedup

Use the Karp-Flatt metric to

analyze parallel program performance

predict speedup with additional processors

Page 3: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

3Analyzing Parallel Performance

Intel® Software College

Speedup

Speedup is the ratio between sequential execution time and parallel execution time

For example, if the sequential program executes in 6 seconds and the parallel program executes in 2 seconds, the speedup is 3

Speedup curveslook like this

Processors

Sp

eed

up y = x

Page 4: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

4Analyzing Parallel Performance

Intel® Software College

Efficiency

EfficiencyA measure of processor utilizationSpeedup divided by the number of processors

ExampleProgram achieves speedup of 3 on 4 CPUsEfficiency is 3 / 4 = 75%

Effi

cien

cy

Processors

Efficiency curveslook like this

y = 1.0

Page 5: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

5Analyzing Parallel Performance

Intel® Software College

Idea Behind Amdahl’s Law

Processors

Execu

tion T

ime

f

f

ff f

1-f

(1-f )/2 (1-f )/3(1-f )/5(1-f )/4

Portion of computationthat will be performed

sequentially

Portion of computationthat will be executed

in parallel

Page 6: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

6Analyzing Parallel Performance

Intel® Software College

Derivation of Amdahl’s Law

Speedup is ratio of execution time on 1 processor to execution time on p processors

Execution time on 1 processor is f + (1-f)

Execution time on p processors is at least f + (1-f)/p

pffpff

ff

/)1(

1

/)1(

)1(

Page 7: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

7Analyzing Parallel Performance

Intel® Software College

Amdahl’s Law Is Too Optimistic

Amdahl’s Law ignores parallel processing overhead

Examples of this overhead include time spent creating and terminating threads

Parallel processing overhead is usually an increasing function of the number of processors

Page 8: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

8Analyzing Parallel Performance

Intel® Software College

Graph with Parallel Overhead Added

Processors

Execu

tion T

ime Parallel overhead

increases with# of processors

Page 9: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

9Analyzing Parallel Performance

Intel® Software College

Other Optimistic Assumptions

Amdahl’s Law assumes that the computation divides evenly among the processors

In reality, the amount of work does not divide evenly among the processors

Processor waiting time is another form of overhead

Task started

Task completed

Working time

Waiting time

Page 10: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

10Analyzing Parallel Performance

Intel® Software College

Graph with Workload Imbalance Added

Processors

Execu

tion T

ime

Time lostdue to

workloadimbalance

Page 11: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

11Analyzing Parallel Performance

Intel® Software College

More General Speedup Formula

(n,p) Speedup for problem of size n on p CPUs

(n) Time spent in sequential portion of codefor problem of size n

(n) Time spent in parallelizable portion ofcode for problem of size n

(n,p)Parallel overhead

),(/)()(

)()(),(

pnpnn

nnpn

Page 12: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

12Analyzing Parallel Performance

Intel® Software College

Amdahl’s Law: Maximum Speedup

),(/)()(

)()(),(

pnpnn

nnpn

This term is set to 0

Assumes parallelwork divides perfectlyamong available CPUs

Page 13: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

13Analyzing Parallel Performance

Intel® Software College

The Amdahl Effect

),(/)()(

)()(),(

pnpnn

nnpn

As n theseterms dominate

Speedup is an increasing function of problem size

Page 14: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

14Analyzing Parallel Performance

Intel® Software College

Illustration of the Amdahl Effect

n = 100,000

n = 10,000

n = 1,000

Processors

Speed

up

Linear speedup

Page 15: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

15Analyzing Parallel Performance

Intel® Software College

Using Amdahl’s Law

Program executes in 5 seconds

Profile reveals 80% of time spent in function alpha, which we can execute in parallel

What would be maximum speedup on 2 processors?

New execution time ≥ 5 sec / 1.67 = 3 seconds

67.16.0

1

2/)2.01(2.0

1

Page 16: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

16Analyzing Parallel Performance

Intel® Software College

The Karp-Flatt Metric

Suppose we benchmark a parallel program and get these speedup figures

Why is efficiency dropping?

How much speedup could we expect on 8 processors?

Processors Speedup Efficiency

2 1.5 75%

3 1.8 60%

4 2 50%

Page 17: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

17Analyzing Parallel Performance

Intel® Software College

Deriving the Karp-Flatt Metric

The denominator represents parallel execution time

One processor does sequential code; others idle

All processors incur overhead time

“Wasted time” = (p-1)(n) + p(n, p)

Experimentally determined serial fraction = “wasted time” divided by (p-1) times sequential time

),(/)()(

)()(),(

pnpnn

nnpn

Page 18: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

18Analyzing Parallel Performance

Intel® Software College

Karp-Flatt Metric

The experimentally determined serial fraction is a function of speedup and the number of processors

We can use e to determine whether efficiency decreases are due to

Sequential component of computation

Increases in overhead

p

pe

/11

/1/1

Page 19: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

19Analyzing Parallel Performance

Intel® Software College

How to Interpret “e”

If “e” is constant as the number of processors increases, then speedup is constrained by the sequential component of the computation

If “e” is increasing as the number of processors increases, then speedup is constrained by parallel overhead, such as

Thread creation/termination timeContention for shared data structuresCache-related inefficiencies

Often a combination of the two factors

Page 20: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

20Analyzing Parallel Performance

Intel® Software College

Going Back to Our Example

Processors Speedup Efficiency e

2 1.5 75% 0.33

3 1.8 60% 0.33

4 2.0 50% 0.33

In this case, speedup is constrained by the relatively large amount of time spent in sequential code

Page 21: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

21Analyzing Parallel Performance

Intel® Software College

Example: Rectangle Rule Program

Benchmark data from an OpenMP program computing using the rectangle rule

We can predict speedup on 6 processors

Extrapolate e to be 0.11

Speedup would be 3.87

Processors Speedup Efficiency e

2 1.87 93% 0.070

3 2.60 87% 0.078

4 3.16 79% 0.089

Page 22: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

22Analyzing Parallel Performance

Intel® Software College

Speedup Prediction Formula

1)1(

/11

/1/1

pe

p

p

pe

Page 23: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

23Analyzing Parallel Performance

Intel® Software College

Case Study

We benchmark a sequential program and find it spends 85% of its time in functions we believe we can make parallel

We make these functions multithreaded and execute the program on a dual-core system

The parallel program achieves a speedup of 1.67 on 2 processors

If we can get access to a quad-core system, what kind of speedup should we expect?

Page 24: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

24Analyzing Parallel Performance

Intel® Software College

Prediction Based on Amdahl’s Law

76.2

4/)15.01(15.0

1

Page 25: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

25Analyzing Parallel Performance

Intel® Software College

Prediction Based on Karp-Flatt Metric

When p = 2, e = 0.25

We know 0.15 of e is sequential component

Rest of e (0.05) is parallel overhead

If parallel overhead increases linearly with number of processors, then it will be 0.15 when p = 3

We predict when p = 4, e = 0.30

Hence when p = 4, we predict speedup of 2.11

Page 26: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

26Analyzing Parallel Performance

Intel® Software College

Superlinear Speedup

According to our general speedup formula, the maximum speedup a program can achieve on p processors is p

Superlinear speedup is the situation where speedup is greater than the number of processors used

It means the computational rate of the processors is faster when the parallel program is executing

Superlinear speedup is usually caused because the cache hit rate of the parallel program is higher

Page 27: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

27Analyzing Parallel Performance

Intel® Software College

References

Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).

Page 28: Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

Copyright © 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

28Analyzing Parallel Performance

Intel® Software College