performance measurement n assignment? n timing #include double when() { struct timeval tp;...

Post on 01-Apr-2015

221 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Performance MeasurementPerformance Measurement

Assignment?Assignment? TimingTiming

#include <sys/time.h>double When(){

struct timeval tp;gettimeofday(&tp, NULL);return((double)tp.tv_sec + (double)tp.tv_usec * 1e-6);

}

Paper SchedulePaper Schedule– 22 Students22 Students– 6 Days6 Days– Look at the schedule and email me your Look at the schedule and email me your

preference. Quickly.preference. Quickly.

A Quantitative Basis for DesignA Quantitative Basis for Design

Parallel programming is an optimization Parallel programming is an optimization problem.problem.

Must take into account several factors:Must take into account several factors:– execution timeexecution time– scalabilityscalability– efficiencyefficiency

A Quantitative Basis for DesignA Quantitative Basis for Design

Parallel programming is an optimization Parallel programming is an optimization problem.problem.

Must take into account several factors:Must take into account several factors: Also must take into account the costs:Also must take into account the costs:

– memory requirementsmemory requirements– implementation costsimplementation costs– maintenance costs etc.maintenance costs etc.

A Quantitative Basis for DesignA Quantitative Basis for Design

Parallel programming is an optimization Parallel programming is an optimization problem.problem.

Must take into account several factors:Must take into account several factors: Also must take into account the costs:Also must take into account the costs: Mathematical performance models are used Mathematical performance models are used

to asses these costs and predict to asses these costs and predict performance.performance.

Defining PerformanceDefining Performance

How do you define parallel performance?How do you define parallel performance? What do you define it in terms of?What do you define it in terms of? ConsiderConsider

– Distributed databasesDistributed databases– Image processing pipelineImage processing pipeline– Nuclear weapons testbedNuclear weapons testbed

Amdahl's LawAmdahl's Law

Every algorithm has a sequential Every algorithm has a sequential component.component.

Sequential component limits speedupSequential component limits speedup

SequentialComponent

MaximumSpeedup

= 1/s = s

Amdahl's LawAmdahl's Law

s

Speedup

What's wrong?What's wrong?

Works fine for a given algorithm.Works fine for a given algorithm.– But what if we change the algorithm?But what if we change the algorithm?

We may change algorithms to increase We may change algorithms to increase parallelism and thus eventually increase parallelism and thus eventually increase performance.performance.– May introduce inefficiencyMay introduce inefficiency

Metrics for PerformanceMetrics for Performance

SpeedupSpeedup EfficiencyEfficiency ScalabilityScalability Others …………..Others …………..

SpeedupSpeedup

SpeedP

SpeedS

1

What is Speed?

What algorithm for Speed1?

What is the work performed?How much work?

Two kinds of SpeedupTwo kinds of Speedup

RelativeRelative– Uses parallel algorithm on 1 processorUses parallel algorithm on 1 processor– Most commonMost common

AbsoluteAbsolute– Uses best known serial algorithmUses best known serial algorithm– Eliminates overheads in calculation.Eliminates overheads in calculation.

SpeedupSpeedup

Algorithm AAlgorithm A– Serial execution time is 10 sec.Serial execution time is 10 sec.– Parallel execution time is 2 sec.Parallel execution time is 2 sec.

Algorithm BAlgorithm B– Serial execution time is 2 sec.Serial execution time is 2 sec.– Parallel execution time is 1 sec.Parallel execution time is 1 sec.

What if I told you A = B?What if I told you A = B?

EfficiencyEfficiency

pS

E

The fraction of time a processor spends doing useful work

Cost (Processor-Time Product)Cost (Processor-Time Product)

ppTC p = # processors

C

TE s

Performance MeasurementPerformance Measurement

Algorithm X achieved speedup of 10.8 on Algorithm X achieved speedup of 10.8 on 12 processors.12 processors.– What is wrong?What is wrong?

A single point of reference is not enough!A single point of reference is not enough! What about asymptotic analysis?What about asymptotic analysis?

Performance MeasurementPerformance Measurement

There is not a perfect way to measure and There is not a perfect way to measure and report performance.report performance.

Wall clock time seems to be the best.Wall clock time seems to be the best. But how much work do you do?But how much work do you do? Best Bet:Best Bet:

– Develop a model that fits experimental results.Develop a model that fits experimental results.

Parallel Programming StepsParallel Programming Steps

Develop algorithmDevelop algorithm Develop a model to predict performanceDevelop a model to predict performance If the performance looks ok then codeIf the performance looks ok then code Check actual performance against modelCheck actual performance against model Report the performanceReport the performance

Performance EvaluationPerformance Evaluation

Identify the dataIdentify the data Design the experiments to obtain the dataDesign the experiments to obtain the data Report dataReport data

Performance EvaluationPerformance Evaluation

Identify the dataIdentify the data– Execution timeExecution time– Be sure to examine a range of data pointsBe sure to examine a range of data points

Design the experiments to obtain the dataDesign the experiments to obtain the data Report dataReport data

Performance EvaluationPerformance Evaluation

Identify the dataIdentify the data Design the experiments to obtain the dataDesign the experiments to obtain the data

– Make sure the experiment measures what you Make sure the experiment measures what you intend to measure.intend to measure.

– Remember: Execution time is max time taken.Remember: Execution time is max time taken.– Repeat your experiments many timesRepeat your experiments many times– Validate data by designing a modelValidate data by designing a model

Report dataReport data

Performance EvaluationPerformance Evaluation

Identify the dataIdentify the data Design the experiments to obtain the dataDesign the experiments to obtain the data Report dataReport data

– Report all information that affects executionReport all information that affects execution– Results should be separate from ConclusionsResults should be separate from Conclusions– Present the data in an easily understandable Present the data in an easily understandable

format.format.

Finite Difference ExampleFinite Difference Example

Finite Difference CodeFinite Difference Code 512 x 512 x 5 Elements512 x 512 x 5 Elements 16 IBM RS6000 workstations16 IBM RS6000 workstations Connected via EthernetConnected via Ethernet

Finite Difference ModelFinite Difference Model

Execution TimeExecution Time– ExTime = (Tcomp + Tcomm)/PExTime = (Tcomp + Tcomm)/P

Communication TimeCommunication Time– Tcomm = 2*lat + 4*bw*n*zTcomm = 2*lat + 4*bw*n*z

Computation TimeComputation Time– Estimate using some sample runsEstimate using some sample runs

Estimated PerformanceEstimated Performance

Finite Difference ExampleFinite Difference Example

What was wrong?What was wrong?

EthernetEthernet Change the computation of TcommChange the computation of Tcomm

– Reduce the bandwithReduce the bandwith– Tcomm = 2*lat + 4*bw*n*z*P/2Tcomm = 2*lat + 4*bw*n*z*P/2

Finite Difference ExampleFinite Difference Example

top related