measurement & performancewsumner/teaching/745/03-measurement.pdf · benchmarking we must reason...

299
Measurement & Performance CMPT 745 Soſtware Engineering Nick Sumner [email protected]

Upload: others

Post on 29-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Measurement& Performance

CMPT 745Software Engineering

Nick [email protected]

Page 2: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Real development must manage resources

Page 3: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Real development must manage resources– Time– Memory– Open connections– VM instances– Energy consumption– ...

Page 4: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Real development must manage resources– Time– Memory– Open connections– VM instances– Energy consumption– ...

● Resource usage is one form of performance– Performance – a measure of nonfunctional behavior of a program

Page 5: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Real development must manage resources– Time– Memory– Open connections– VM instances– Energy consumption– ...

● Resource usage is one form of performance– Performance – a measure of nonfunctional behavior of a program

● We often need to assess performance or a change in performanceData Structure A Data Structure Bvs

Page 6: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Real development must manage resources– Time– Memory– Open connections– VM instances– Energy consumption– ...

● Resource usage is one form of performance– Performance – a measure of nonfunctional behavior of a program

● We often need to assess performance or a change in performanceData Structure A Data Structure Bvs

How would you approach this in a data structures course?

Page 7: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Performance assessment is deceptively hard[Demo/Exercise]

Page 8: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Performance assessment is deceptively hard– Modern systems involve complex actors

Page 9: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate

Page 10: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves

Page 11: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves

● Good performance evaluation should be rigorous & scientific

Page 12: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves

● Good performance evaluation should be rigorous & scientific– The same process applies in development as in good research

Page 13: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves

● Good performance evaluation should be rigorous & scientific– The same process applies in development as in good research

Page 14: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves

● Good performance evaluation should be rigorous & scientific– The same process applies in development as in good research

1) Clear claims2) Clear evidence3) Correct reasoning from evidence to claims

Page 15: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement

● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves

● Good performance evaluation should be rigorous & scientific– The same process applies in development as in good research

1) Clear claims2) Clear evidence3) Correct reasoning from evidence to claims

– And yet this is challenging to get right!

Page 16: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

Page 17: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

Scope ofEvaluation

Scope ofClaim/Conclusion

Page 18: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

Scope ofEvaluation

Scope ofClaim/ConclusionValidity

Page 19: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inscrutability– Lack of clarity on actors or relationships– Omission, Ambiguity, Distortion

Page 20: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inscrutability– Lack of clarity on actors or relationships– Omission, Ambiguity, Distortion

● Irreproducibility– Lack of clarity in steps taken or data– Causes:

● Omission of steps

Page 21: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inscrutability– Lack of clarity on actors or relationships– Omission, Ambiguity, Distortion

● Irreproducibility– Lack of clarity in steps taken or data– Causes:

● Omission of steps● Incomplete understanding of factors

Page 22: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inscrutability– Lack of clarity on actors or relationships– Omission, Ambiguity, Distortion

● Irreproducibility– Lack of clarity in steps taken or data– Causes:

● Omission of steps● Incomplete understanding of factors● Confidentiality & omission of data

Example ...

Page 23: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}

Compare gcc -O2 vs -O3

Page 24: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}

Compare gcc -O2 vs -O3

One person may see adeterministic improvement..

Page 25: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}

Compare gcc -O2 vs -O3

One person may see adeterministic improvement..

Another may see adeterministic degradation.

Page 26: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}

Compare gcc -O2 vs -O3

One person may see adeterministic improvement..

Another may see adeterministic degradation.

Both are right.

Page 27: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}

Compare gcc -O2 vs -O3

Page 28: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Ignorance – disregarding data or evidence against a claim– Ignoring data points

Page 29: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Ignorance – disregarding data or evidence against a claim– Ignoring data points

Page 30: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Ignorance – disregarding data or evidence against a claim– Ignoring data points

Page 31: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Ignorance – disregarding data or evidence against a claim– Ignoring data points

Page 32: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Ignorance – disregarding data or evidence against a claim– Ignoring data points– Ignoring distributions

Page 33: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Ignorance – disregarding data or evidence against a claim– Ignoring data points– Ignoring distributions

Gmail latency

Page 34: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Ignorance – disregarding data or evidence against a claim– Ignoring data points– Ignoring distributions

Gmail latency

If we reason about average latency,why is it misleading?

Page 35: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Ignorance – disregarding data or evidence against a claim– Ignoring data points– Ignoring distributions

Gmail latency

If we reason about average latency,why is it misleading?

What is better?

Page 36: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inappropriateness – claim is derived from facts not present

Page 37: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inappropriateness – claim is derived from facts not present– Bad metrics (e.g. execution time vs. power)

Page 38: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples

Page 39: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples– ...

Page 40: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples– ...

● Inconsistency – comparing apples to oranges

Page 41: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples– ...

● Inconsistency – comparing apples to oranges– Workload variation (e.g. learner effects, time of day)

Page 42: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Performance & Measurement [Blackburn et al.]

● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples– ...

● Inconsistency – comparing apples to oranges– Workload variation (e.g. learner effects, time of day)– Incompatible measures (e.g. performance counters across platforms)

Page 43: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

AssessingPerformance

Page 44: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● We must reason rigorously about performance duringassessment, investigation, & improvement

Page 45: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● We must reason rigorously about performance duringassessment, investigation, & improvement

● Assessing performance is done through benchmarking

Page 46: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● We must reason rigorously about performance duringassessment, investigation, & improvement

● Assessing performance is done through benchmarking– Microbenchmarks

● Focus on cost of an operation in isolation● Can help identify core performance details & explain causes

Page 47: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● We must reason rigorously about performance duringassessment, investigation, & improvement

● Assessing performance is done through benchmarking– Microbenchmarks

● Focus on cost of an operation in isolation● Can help identify core performance details & explain causes

– Macrobenchmarks● Real world system performance

Page 48: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● We must reason rigorously about performance duringassessment, investigation, & improvement

● Assessing performance is done through benchmarking– Microbenchmarks

● Focus on cost of an operation in isolation● Can help identify core performance details & explain causes

– Macrobenchmarks● Real world system performance

– Workloads (inputs) must be chosen carefully either way.● representative, pathological, scenario driven, ...

Page 49: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● We must reason rigorously about performance duringassessment, investigation, & improvement

● Assessing performance is done through benchmarking– Microbenchmarks

● Focus on cost of an operation in isolation● Can help identify core performance details & explain causes

– Macrobenchmarks● Real world system performance

– Workloads (inputs) must be chosen carefully either way.● representative, pathological, scenario driven, ...

Let’s dig into a common approach to consider issues

Page 50: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Suppose we want to run a microbenchmarkstartTime = getCurrentTimeInSeconds();doWorkloadOfInterest();endTime = getCurrentTimeInSeconds();reportResult(endTime – startTime);

Page 51: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Suppose we want to run a microbenchmarkstartTime = getCurrentTimeInSeconds();doWorkloadOfInterest();endTime = getCurrentTimeInSeconds();reportResult(endTime – startTime);

What possible issues do you observe?

Page 52: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Suppose we want to run a microbenchmark

– Granularity of measurement– Warm up effects– Nondeterminism– Size of workload– System interference– Frequency scaling?– Interference of other workloads?– Alignment?

startTime = getCurrentTimeInSeconds();doWorkloadOfInterest();endTime = getCurrentTimeInSeconds();reportResult(endTime – startTime);

Page 53: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Granularity & Units– Why is granularity a problem?– What are alternatives to getCurrentTimeInSeconds()?

Page 54: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Granularity & Units– Why is granularity a problem?– What are alternatives to getCurrentTimeInSeconds()?– What if I want to predict performance on a different machine?

Page 55: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Granularity & Units– Why is granularity a problem?– What are alternatives to getCurrentTimeInSeconds()?– What if I want to predict performance on a different machine?

● Using cycles instead of wall clock time can be useful, but has its own limitations

Page 56: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Warm up time– Why is warm up time necessary in general?

Page 57: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Warm up time– Why is warm up time necessary in general?– Why is it especially problematic for systems like Java?

Page 58: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Warm up time– Why is warm up time necessary in general?– Why is it especially problematic for systems like Java?– How can we modify our example to facilitate this?

Page 59: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Warm up time– Why is warm up time necessary in general?– Why is it especially problematic for systems like Java?– How can we modify our example to facilitate this?

for (…) doWorkloadOfInterest();startTime = getCurrentTimeInSeconds();doWorkloadOfInterest();endTime = getCurrentTimeInSeconds();reportResult(endTime – startTime);

Page 60: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Nondeterministic behavior– Will getCurrentTimeInSeconds() always return the same

number?

Why/why not?

Page 61: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Nondeterministic behavior– Will getCurrentTimeInSeconds() always return the same

number?– So what reflects a meaningful result?

● Hint: The Law of Large Numbers!

Page 62: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Nondeterministic behavior– Will getCurrentTimeInSeconds() always return the same

number?– So what reflects a meaningful result?

● Hint: The Law of Large Numbers!

● By running the same test many times,the arithmetic mean will converge on the expected value

Page 63: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Nondeterministic behavior– Will getCurrentTimeInSeconds() always return the same

number?– So what reflects a meaningful result?

● Hint: The Law of Large Numbers!

● By running the same test many times,the arithmetic mean will converge on the expected value

Is this always what you want?

Page 64: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● A revised (informal) approach:

for (…) doWorkloadOfInterest();startTime = getCurrentTimeInNanos();for (…) doWorkloadOfInterest();endTime = getCurrentTimeInNanos();reportResult(endTime – startTime);

Page 65: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● A revised (informal) approach:

● This still does not solve everything– Frequency scaling?– Interference of other workloads?– Alignment?

for (…) doWorkloadOfInterest();startTime = getCurrentTimeInNanos();for (…) doWorkloadOfInterest();endTime = getCurrentTimeInNanos();reportResult(endTime – startTime);

Page 66: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Now we have a benchmark, how do we interpret/report it?– We must compare

Page 67: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Now we have a benchmark, how do we interpret/report it?– We must compare

● Benchmark vs expectation/mental model● Different solutions● Over time

Page 68: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Now we have a benchmark, how do we interpret/report it?– We must compare

● Benchmark vs expectation/mental model● Different solutions● Over time

● Results are often normalized against the baseline

Page 69: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Now we have a benchmark, how do we interpret/report it?– We must compare– We must remember results are statistical

Page 70: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Now we have a benchmark, how do we interpret/report it?– We must compare– We must remember results are statistical

● Show the distribution (e.g. violin plots)

[Seaborn Violinplot]

Page 71: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● Now we have a benchmark, how do we interpret/report it?– We must compare– We must remember results are statistical

● Show the distribution (e.g. violin plots)● Summarize the distribution (e.g. mean and confidence intervals, box & whisker)

[Seaborn Violinplot] [Seaborn Boxplot][Seaborn Barplot]

Page 72: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● A benchmark suite comprises multiple benchmarks

Old

New

T1 T2 T3 T4 T5 T6

Page 73: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● A benchmark suite comprises multiple benchmarks

● Now we have multiple results, how should we consider them?

Old

New

T1 T2 T3 T4 T5 T6

Page 74: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● A benchmark suite comprises multiple benchmarks

● Now we have multiple results, how should we consider them?– 2 major senarios

● Hypothesis testing– Is solution A different than B?

Old

New

T1 T2 T3 T4 T5 T6

Page 75: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● A benchmark suite comprises multiple benchmarks

● Now we have multiple results, how should we consider them?– 2 major senarios

● Hypothesis testing– Is solution A different than B?– You can use ANOVA

Old

New

T1 T2 T3 T4 T5 T6

Page 76: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● A benchmark suite comprises multiple benchmarks

● Now we have multiple results, how should we consider them?– 2 major senarios

● Hypothesis testing● Summary statistics

Old

New

T1 T2 T3 T4 T5 T6

Page 77: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● A benchmark suite comprises multiple benchmarks

● Now we have multiple results, how should we consider them?– 2 major senarios

● Hypothesis testing● Summary statistics

– Condensing a suite to a single number– Intrinsically lossy, but can still be useful

Old

New

T1 T2 T3 T4 T5 T6

Page 78: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● A benchmark suite comprises multiple benchmarks

● Now we have multiple results, how should we consider them?– 2 major senarios

● Hypothesis testing● Summary statistics

– Condensing a suite to a single number– Intrinsically lossy, but can still be useful

Old

New

T1 T2 T3 T4 T5 T6

Old: ?New: ?

NewOld :?

Page 79: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

Averages of r1, r2, …, rN

● Many ways to measure expectation or tendency

Page 80: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

Averages of r1, r2, …, rN

● Many ways to measure expectation or tendency

● Arithmetic Mean 1N ∑i=1

N

r i

Page 81: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

1N∑i=1

N

r i

Summary Statistics

Averages of r1, r2, …, rN

● Many ways to measure expectation or tendency

● Arithmetic Mean

● Harmonic MeanN

∑i=1

N1r i

Page 82: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

Averages of r1, r2, …, rN

● Many ways to measure expectation or tendency

● Arithmetic Mean

● Harmonic Mean

● Geometric Mean

1N∑i=1

N

r i N

∑i=1

N1r i

N√∏i=1

N

r i

Page 83: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

Averages of r1, r2, …, rN

● Many ways to measure expectation or tendency

● Arithmetic Mean

● Harmonic Mean

● Geometric Mean

1N∑i=1

N

r i N

∑i=1

N1r i

N√∏i=1

N

r i

Each type means something different and has valid uses

Page 84: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing

1N∑i=1

N

r i

Page 85: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means

1N∑i=1

N

r i

Page 86: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times 1

N∑i=1

N

r i

Page 87: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times

for (x in 0 to 4) times[x] = doWorkloadOfInterest();

Handling Nondeterminism

E(time) = arithmean(times)

1N∑i=1

N

r i

Page 88: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean – Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times

● Harmonic Mean– Good for reporting rates

N

∑i=1

N1r i

1N∑i=1

N

r i

Page 89: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times

● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks

N

∑i=1

N1r i

1N∑i=1

N

r i

Page 90: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times

● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks

1N∑i=1

N

r i

N

∑i=1

N1r i

Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec

What is the average throughput? What should it mean?1N∑i=1

N

r i

Page 91: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times

● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks

1N∑i=1

N

r i

N

∑i=1

N1r i

Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec

What is the average throughput? What should it mean? Arithmetic = 16.7 p/s

1N∑i=1

N

r i

Page 92: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times

● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks

1N∑i=1

N

r i

N

∑i=1

N1r i

Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec

What is the average throughput? What should it mean? Arithmetic = 16.7 p/s Harmonic = 15 p/s

1N∑i=1

N

r i

Page 93: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times

● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks

1N∑i=1

N

r i

N

∑i=1

N1r i

Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec

What is the average throughput? What should it mean? Arithmetic = 16.7 p/s Harmonic = 15 p/s 120/16.7 = 7.2 120/15 = 8

1N∑i=1

N

r i

Page 94: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times

● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks

1N∑i=1

N

r i

N

∑i=1

N1r i

Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec

What is the average throughput? What should it mean? Arithmetic = 16.7 p/s Harmonic = 15 p/s 120/16.7 = 7.2 120/15 = 8

Identifies the constant raterequired for the same time

1N∑i=1

N

r i

Page 95: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times

● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks

1N∑i=1

N

r i

Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec

What is the average throughput? What should it mean? Arithmetic = 16.7 p/s Harmonic = 15 p/s 120/16.7 = 7.2 120/15 = 8

N

∑i=1

N1r i

Identifies the constant raterequired for the same time

CAVEAT: If the size of each workload changes,a weighted harmonic mean is required!

1N∑i=1

N

r i

Page 96: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

N√∏i=1

N

r i

Page 97: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

Any idea why it may be useful here?(A bit of a thought experiment)

N√∏i=1

N

r i

Page 98: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

OldT1 T2

N√∏i=1

N

r i

Page 99: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

OldT1 T2

New 1

T1 T2

What happens to thearithmetic mean?

halved

N√∏i=1

N

r i

Page 100: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

OldT1 T2

What happens to thearithmetic mean?

New 2

T1 T2

halved

N√∏i=1

N

r i

Page 101: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

OldT1 T2

The (non) change to T1 dominatesany behavior for T2!

N√∏i=1

N

r i

Page 102: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

OldT1 T2

Geometric:

√r1×r2

Old

N√∏i=1

N

r i

Page 103: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

OldT1 T2

Geometric:

√r1×r2

Old New 1√r1×(

12r2)

N√∏i=1

N

r i

Page 104: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

OldT1 T2

Geometric:

√r1×r2

Old New 1√(

12r1)×r2

New 2√r1×(

12r2)

N√∏i=1

N

r i

Page 105: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks

OldT1 T2

Geometric:

√r1×r2

Old New 1√(

12r1)×r2

New 2√r1×(

12r2) =√ 1

2×r1×r2=

N√∏i=1

N

r i

Page 106: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean– Good for reporting results that mean different things– e.g. Timing results across many different benchmarks– A 10% difference in any benchmark affects the final value the same way

N√∏i=1

N

r i

Page 107: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary Statistics

● Geometric Mean– Good for reporting results that mean different things– e.g. Timing results across many different benchmarks– A 10% difference in any benchmark affects the final value the same way

Note: It doesn't have an intuitive meaning!It does provides a balanced score of performance.

See [Mashey 2004] for deeper insights.

N√∏i=1

N

r i

Page 108: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Benchmarking

● In practice applying good benchmarking & statistics is made easier via frameworks– Google benchmark (C & C++)– Google Caliper (Java)– Nonius– Celero– Easybench– Pyperf– ...

Page 109: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

InvestigatingPerformance

Page 110: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling

● When benchmark results do not make sense, you shouldinvestigate why

Page 111: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling

● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?

Page 112: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling

● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?

● Sometimes microbenchmarks provide sufficient insight

Page 113: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling

● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?

● Sometimes microbenchmarks provide sufficient insight

● In other cases you will want to profile

Page 114: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling

● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?

● Sometimes microbenchmarks provide sufficient insight

● In other cases you will want to profile– Collect additional information about resources in an execution– The nature of the tool will depend on the resource and the objective

Page 115: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling

● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?

● Sometimes microbenchmarks provide sufficient insight

● In other cases you will want to profile– Collect additional information about resources in an execution– The nature of the tool will depend on the resource and the objective

You should already be familiar with tools like gprof or jprofile.We’ll examine some more advanced profilers now.

Page 116: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Heap profiling

● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!

Page 117: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Heap profiling

● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!– If I can identify where & why memory is consumed, I can remediate

● Maybe better algorithm● Maybe competent use of data structures....

Page 118: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Heap profiling

● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!– If I can identify where & why memory is consumed, I can remediate

● Maybe better algorithm● Maybe competent use of data structures....

● Heap profilers track the allocated memory in a program& their provenance

Page 119: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Heap profiling

● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!– If I can identify where & why memory is consumed, I can remediate

● Maybe better algorithm● Maybe competent use of data structures....

● Heap profilers track the allocated memory in a program& their provenance– Can identify hotspots, bloat, leaks, short lived allocations, ...

Page 120: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Heap profiling

● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!– If I can identify where & why memory is consumed, I can remediate

● Maybe better algorithm● Maybe competent use of data structures....

● Heap profilers track the allocated memory in a program& their provenance– Can identify hotspots, bloat, leaks, short lived allocations, ...– Usually sample based, but sometimes event based– e.g. Massif, Heaptrack, ...

Page 121: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Heap profiling

intmain() { std::vector<std::unique_ptr<long[]>> data{DATA_SIZE};

for (auto &element : data) { element = std::make_unique<long[]>(BLOCK_SIZE); // do something with element std::this_thread::sleep_for(std::chrono::milliseconds(100)); }

std::this_thread::sleep_for(std::chrono::seconds(1)); return 0;}

Page 122: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Heap profiling

intmain() { std::vector<std::unique_ptr<long[]>> data{DATA_SIZE};

for (auto &element : data) { element = std::make_unique<long[]>(BLOCK_SIZE); // do something with element std::this_thread::sleep_for(std::chrono::milliseconds(100)); }

std::this_thread::sleep_for(std::chrono::seconds(1)); return 0;}valgrind --time-unit=ms --tool=massif <program invocation>heaptrack <program invocation>

massif-visualizer massif.out.<PID>heaptrack_gui <path to data>

Page 123: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Heap profilingintmain() { std::vector<std::unique_ptr<long[]>> data{DATA_SIZE};

for (auto &element : data) { element = std::make_unique<long[]>(BLOCK_SIZE); // do something with element std::this_thread::sleep_for(std::chrono::milliseconds(100)); element.reset(); std::this_thread::sleep_for(std::chrono::milliseconds(100)); }

std::this_thread::sleep_for(std::chrono::seconds(1)); return 0;}

How do we expect this to differ?

Page 124: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...

Page 125: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...

● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually

Page 126: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...

● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually

main()

foo() bar()

baz() quux()

70% 20%

70% 20%

Page 127: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...

● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually

● Flame graphs provide a way of structuring and visualizing substantial profiling information

Page 128: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...

● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually

● Flame graphs provide a way of structuring and visualizing substantial profiling information

main()

foo() bar()

baz() quux()

Page 129: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...

● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually

● Flame graphs provide a way of structuring and visualizing substantial profiling information

main()

foo() bar()

baz() quux()

It is easier to see thatoptimizing baz() could be useful.

Page 130: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...

● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually

● Flame graphs provide a way of structuring and visualizing substantial profiling information– Consumers of CPU on top

main()

foo() bar()

baz() quux()

Page 131: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...

● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually

● Flame graphs provide a way of structuring and visualizing substantial profiling information– Consumers of CPU on top– ancestry, proportions, components can all be clearly identified

main()

foo() bar()

baz() quux()

Page 132: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● Can extract rich information by embedding interesting things in colors

[Gregg, ATC 2017]

Page 133: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● Flame graphs are not just limited to CPU time!– Any countable resource or event can be organized & visualized

Page 134: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

CPU Profiling & Flame Graphs

● Flame graphs are not just limited to CPU time!– Any countable resource or event can be organized & visualized

● You can also automatically generate them with clang & chrome– See project X-Ray in clang

Page 135: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Perf & event profiling

● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses

Page 136: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Perf & event profiling

● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses

How well does sample based profiling work for these?

Page 137: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Perf & event profiling

● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses

How well does sample based profiling work for these?

● Instead, we can leverage low level system counters via tools like perf

Page 138: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Perf & event profiling

● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses

How well does sample based profiling work for these?

● Instead, we can leverage low level system counters via tools like perf

Page 139: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Perf & event profiling

● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses

How well does sample based profiling work for these?

● Instead, we can leverage low level system counters via tools like perfperf stat -e <events> -g <command>perf record -e <events> -g <command>perf reportperf list

Page 140: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Perf & event profiling

● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses

How well does sample based profiling work for these?

● Instead, we can leverage low level system counters via tools like perfperf stat -e <events> -g <command>perf record -e <events> -g <command>perf reportperf list

task-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,branches,branch-misses,cache-misses,cycle_activity.stalls_total

events like

Page 141: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling for opportunities

● Causal profiling

...fo

o()

...

foo

()

foo

()

foo

()

Page 142: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling for opportunities

● Causal profiling

What should I look at to speed things up?

...fo

o()

...

foo

()

foo

()

foo

()

Page 143: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling for opportunities

● Causal profiling

What should I look at to speed things up?

... ...

foo

()

...fo

o()

...

foo

()

foo

()

foo

()

...

foo

()

...

foo

()

foo

()

foo

()

foo

()fo

o()

foo

()

Page 144: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling for opportunities

● Causal profiling

● Profiling for parallelism

...

foo

()

...fo

o()

foo

()

foo

()

Page 145: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Profiling for opportunities

● Causal profiling

● Profiling for parallelism

...

foo

()

...fo

o()

foo

()

foo

()

foo

()

...fo

o()

...

foo

()fo

o()

Page 146: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

ImprovingPerformance

Page 147: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Improving Performance

● We can attack performance at several levels

Page 148: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Improving Performance

● We can attack performance at several levels– Compilers & tuning the build process

Page 149: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Improving Performance

● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data

Page 150: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Improving Performance

● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data– Managing the organization of code

Page 151: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Improving Performance

● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data– Managing the organization of code– Better algorithms & algorithmic modeling

Page 152: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Improving Performance

● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data– Managing the organization of code– Better algorithms & algorithmic modeling

● In all cases, we only care about improving performance of hot code

Page 153: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Improving Performance

● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data– Managing the organization of code– Better algorithms & algorithmic modeling

● In all cases, we only care about improving performance of hot code

● Optimizing cold code can hurt software

Page 154: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

Page 155: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO (Link Time Optimization / Whole Program Optimization)

Page 156: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO (Link Time Optimization / Whole Program Optimization)

foo.c foo.o

bar.c bar.o

Compile &

Optimize

Compile &

Optimize

Page 157: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO (Link Time Optimization / Whole Program Optimization)

foo.c foo.o

bar.c bar.o

Compile &

Optimize

Compile &

Optimize

programLink

Page 158: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO (Link Time Optimization / Whole Program Optimization)

foo.c foo.o

bar.c bar.o

Compile &

Optimize

Compile &

Optimize

program(.o)Merge programOptimize &Link

Page 159: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO

● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions

Page 160: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO

● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions

funPtr = ?...

funPtr()

Page 161: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO

● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions

funPtr = ?...

funPtr()

foo(){A}

bar(){B}

Page 162: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO

● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions

funPtr = ?...

funPtr()

foo(){A}

bar(){B}

funPtr = ?...

if funPtr == bar: B’else: funPtr()

Page 163: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO

● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions

funPtr = ?...

funPtr()

foo(){A}

bar(){B}

funPtr = ?...

if funPtr == bar: B’else: funPtr()

[Visual Studio profile guided optimizations]

Page 164: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO

● PGO

● Layout optimization (BOLT and otherwise)

Page 165: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Compiling for performance

● Enabling optimizations...

● LTO

● PGO

● Layout optimization (BOLT and otherwise)

● Polyhedral analysis

Page 166: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● The basic directions of data optimizations

Page 167: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● The basic directions of data optimizations– Ensure the data you want is available for the tasks you have

Page 168: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● The basic directions of data optimizations– Ensure the data you want is available for the tasks you have– Do not spend time processing you do not need

Page 169: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● The basic directions of data optimizations– Ensure the data you want is available for the tasks you have– Do not spend time processing you do not need– Do not spend extra time managing the data at the system level

Page 170: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● The basic directions of data optimizations– Ensure the data you want is available for the tasks you have– Do not spend time processing you do not need– Do not spend extra time managing the data at the system level

Several aspects of high level design may be in tension with these

Page 171: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache

struct S1 { char a;};sizeof(S1) == 1

struct S2 { uint32_t b;};sizeof(S2) == 4

Page 172: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache

struct S1 { char a;};sizeof(S1) == 1

struct S2 { uint32_t b;};sizeof(S2) == 4

struct S3 { char a; uint32_t b; char c;};

sizeof(S3) == ?

Page 173: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache

struct S1 { char a;};sizeof(S1) == 1

struct S2 { uint32_t b;};sizeof(S2) == 4

struct S3 { char a; uint32_t b; char c;};

sizeof(S3) == 12

uint32_t must be 4 byte aligned.Padding is inserted!

Page 174: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache

struct S1 { char a;};sizeof(S1) == 1

struct S2 { uint32_t b;};sizeof(S2) == 4

struct S3 { char a; uint32_t b; char c;};

sizeof(S3) == 12

struct S4 { char a; char c; uint32_t b;};

sizeof(S3) == 8

Page 175: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache

struct S1 { char a;};sizeof(S1) == 1

struct S2 { uint32_t b;};sizeof(S2) == 4

struct S3 { char a; uint32_t b; char c;};

sizeof(S3) == 12

struct S4 { char a; char c; uint32_t b;};

Careful ordering improvescache utilization

sizeof(S3) == 8

Page 176: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more

Page 177: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more

● Operate on compressed data

Page 178: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more

● Operate on compressed data● Steal low/high order bits of pointers

Page 179: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more

● Operate on compressed data● Steal low/high order bits of pointers

template <class PointedTo>class PointerValuePair<PointedTo,int> { uintptr_t compact; PointedTo* getP() { return reinterpret_cast<PointedTo*>(compact & ~0xFFFFFFF8); } Value getV() { return compact & 0x00000007; }};

Page 180: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more

● Operate on compressed data● Steal low/high order bits of pointers

template <class PointedTo>class PointerValuePair<PointedTo,int> { uintptr_t compact; PointedTo* getP() { return reinterpret_cast<PointedTo*>(compact & ~0xFFFFFFF8); } Value getV() { return compact & 0x00000007; }};

Page 181: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more

● Operate on compressed data● Steal low/high order bits of pointers

template <class PointedTo>class PointerValuePair<PointedTo,int> { uintptr_t compact; PointedTo* getP() { return reinterpret_cast<PointedTo*>(compact & ~0xFFFFFFF8); } Value getV() { return compact & 0x00000007; }};

Page 182: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more

● Operate on compressed data● Steal low/high order bits of pointers

template <class PointedTo>class PointerValuePair<PointedTo,int> { uintptr_t compact; PointedTo* getP() { return reinterpret_cast<PointedTo*>(compact & ~0xFFFFFFF8); } Value getV() { return compact & 0x00000007; }};

Page 183: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

Page 184: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

std::list numbers = ...for (auto& i : numbers) { ...} We already saw this.

Traversing a linked list is expensive!

Page 185: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

std::list numbers = ...for (auto& i : numbers) { ...}

3 1 4 1

Page 186: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

std::list numbers = ...for (auto& i : numbers) { ...}

3 1 4 1

These elements are unlikely to be in cacheand unlikely to be prefetched automatically.

Page 187: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

std::list numbers = ...for (auto& i : numbers) { ...}

3 1 4 1

Page 188: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

std::list numbers = ...for (auto& i : numbers) { ...}

3 1 4 1

Page 189: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

std::list numbers = ...for (auto& i : numbers) { ...}

3 1 4 1

Stall

Page 190: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

std::list numbers = ...for (auto& i : numbers) { ...}

3 1 4 1

Stall

Page 191: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

std::list numbers = ...for (auto& i : numbers) { ...}

3 1 4 1

Stall

Page 192: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

std::list numbers = ...for (auto& i : numbers) { ...}

3 1 4 1

Stall

Page 193: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory

How does this relate to design tools that we have seen?

Page 194: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization

Page 195: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining

Page 196: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

Page 197: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining

for (Dog& d : dogs) { play(d.friendliness, d.hobby);}

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

Page 198: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining

for (Dog& d : dogs) { play(d.friendliness, d.hobby);}

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

Page 199: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining

for (Dog& d : dogs) { play(d.friendliness, d.hobby);}

Page 200: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining

for (Dog& d : dogs) { play(d.friendliness, d.hobby);}

We can try to push the cold fieldsout of the cache

Page 201: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining

struct HotDog { double friendliness; std::string hobby; unique_ptr<Cold> cold;};

struct Cold { uint32_t age; uint32_t ownerID; Food treats[10];};

for (Dog& d : dogs) { play(d.friendliness, d.hobby);}

Page 202: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining

struct HotDog { double friendliness; std::string hobby; unique_ptr<Cold> cold;};

struct Cold { uint32_t age; uint32_t ownerID; Food treats[10];};

for (Dog& d : dogs) { play(d.friendliness, d.hobby);}

Benefits depend on the size of Cold & the access patterns

Page 203: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

Page 204: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

struct DogManager { std::vector<uint32_t> friendliness; std::vector<uint32_t> age; std::vector<uint32_t> ownerID; std::vector<std::string> hobby; std::vector<std::array<Food,10>> treats;};

Page 205: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};

struct DogManager { std::vector<uint32_t> friendliness; std::vector<uint32_t> age; std::vector<uint32_t> ownerID; std::vector<std::string> hobby; std::vector<std::array<Food,10>> treats;};

for (auto i : range(dogs)) { play(friendliness[i], hobby[i]);}

Page 206: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

Dog1 Dog2

Page 207: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

Dog1 Dog2friend hobby friend hobby

Page 208: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

Dog1 Dog2

friendliness

Dog1

age

hobby

Dog2 Dog3 Dog4

Page 209: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

Dog1 Dog2

friendliness

Dog1

age

hobby

Dog2 Dog3 Dog4 You can pick and choose while stillgetting good locality

Page 210: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

Dog1 Dog2

friendliness

Dog1

age

hobby

Dog2 Dog3 Dog4 You can pick and choose while stillgetting good locality

Page 211: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

Dog1 Dog2

friendliness

Dog1

age

hobby

Dog2 Dog3 Dog4 You can pick and choose while stillgetting good locality

Easier for compilers to vectorize

Page 212: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays

Dog1 Dog2

friendliness

Dog1

age

hobby

Dog2 Dog3 Dog4 You can pick and choose while stillgetting good locality

Easier for compilers to vectorize

Also a foundation of moderngame engine design (ECS)

Page 213: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Loop invariance– Avoid recomputing the same values inside a loop

Page 214: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Loop invariance– Avoid recomputing the same values inside a loop

for (auto i : ...) { auto sqrt2 = sqrt(2); auto x = f(i, sqrt2); ...}

auto sqrt2 = sqrt(2);for (auto i : ...) { auto x = f(i, sqrt2); ...}

Page 215: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Loop invariance– Avoid recomputing the same values inside a loop– Compilers automate this but cannot always succeed (LICM)

for (auto i : ...) { auto sqrt2 = sqrt(2); auto x = f(i, sqrt2); ...}

auto sqrt2 = sqrt(2);for (auto i : ...) { auto x = f(i, sqrt2); ...}

Page 216: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

Page 217: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

Page 218: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

Page 219: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1

Page 220: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1 2

Page 221: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1 2 3

Page 222: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1 2 3 4

Page 223: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1 2 3 4 5

Memory accessesare consecutive!

Page 224: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

Page 225: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1

Page 226: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1 2

Page 227: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1 2 3

Page 228: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1 4 2 3

Page 229: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1 4 2 5 3

Page 230: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}

uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}

1 4 2 5 3

Memory accessesjump around &

thrash the cache!

Page 231: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work

×

Page 232: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work

×

Page 233: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work

×

Page 234: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work

×

Page 235: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work

×

Page 236: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work

×......

Page 237: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work

×...Problem:Using the same layout creates bad locality.

...

Page 238: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work

×

Solution: Transpose first.Implement over the transpose instead.

... ...

Page 239: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work

×... ...Note: Better solutions further leverage

layout & parallelization.

Page 240: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Memory management effects– Data structure packing & access patterns affect deeper system behavior

Page 241: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Memory management effects– Data structure packing & access patterns affect deeper system behavior

● What about virtual memory, page tables, & the TLB?

Page 242: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Memory management effects– Data structure packing & access patterns affect deeper system behavior

● What about virtual memory, page tables, & the TLB?● What about allocation strategies & fragmentation?

Page 243: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Designing with clear ownership policies in mind

Page 244: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Designing with clear ownership policies in mind– Resource acquisition should not happen in hot code

Page 245: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Designing with clear ownership policies in mind– Resource acquisition should not happen in hot code– Use APIs that express intent & prevent copying

Page 246: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Designing with clear ownership policies in mind– Resource acquisition should not happen in hot code– Use APIs that express intent & prevent copying

“std::string is responsible for almost half of all allocations in the Chrome”

Page 247: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Data

● Designing with clear ownership policies in mind– Resource acquisition should not happen in hot code– Use APIs that express intent & prevent copying

“std::string is responsible for almost half of all allocations in the Chrome”

template<class E>struct Span {

template<class E, auto N>Span(const std::array<E,N>& c);

template<class E>Span(const std::vector<E>& c);

E* first;size_t count;

};

Page 248: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Basic ideas for code optimization

Page 249: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Basic ideas for code optimization– Avoid branching whenever possible

Page 250: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Basic ideas for code optimization– Avoid branching whenever possible

Misspeculating over a branch is costly

Page 251: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Basic ideas for code optimization– Avoid branching whenever possible– Make code that does the same thing occur close together temporally

Page 252: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Basic ideas for code optimization– Avoid branching whenever possible– Make code that does the same thing occur close together temporally

Leverage the instruction cache if you can

Page 253: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation

Page 254: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements

for (...) { if (foo(c)) { bar(); } else { baz(); }}

A

B

90%

10%

Page 255: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements

for (...) { if (foo(c)) { bar(); } else { baz(); }}

A

B

Pipeline:90%

10%

A A A

Actual: A

Page 256: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements

for (...) { if (foo(c)) { bar(); } else { baz(); }}

A

B

Pipeline:90%

10%

A A A

Actual: A A

Page 257: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements

for (...) { if (foo(c)) { bar(); } else { baz(); }}

A

B

Pipeline:90%

10%

A A A

Actual: A A B

Page 258: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements

for (...) { if (foo(c)) { bar(); } else { baz(); }}

A

B

Pipeline:90%

10%

A A A

Actual: A A B

Stall, but relatively infrequently

Page 259: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements

for (...) { if (foo(c)) { bar(); } else { baz(); }}

A

B

51%

49%

Page 260: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements

for (...) { if (foo(c)) { bar(); } else { baz(); }}

A

B

Pipeline:51%

49%

A A A

Actual: A B

Stall, frequently

Page 261: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements– On function pointers!

for (...) { foo();}

A

B

bar() {}

baz() {}

51%

49%

Page 262: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements– On function pointers!

for (...) { foo();}

A

B

bar() {}

baz() {}

The same problems arise51%

49%

Page 263: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Branch prediction & speculation– On if statements– On function pointers!

for (...) { foo();}

A

B

bar() {}

baz() {}

The same problems arise51%

49%Consistent call targets

perform better

Page 264: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Designing away checks– Repeated checks can be removed by maintaining invariants

Page 265: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Designing away checks– Repeated checks can be removed by maintaining invariants

i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1

[Wikipedia’s Insertion Sort]

Page 266: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Designing away checks– Repeated checks can be removed by maintaining invariants

i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1

[Wikipedia’s Insertion Sort]

Page 267: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Designing away checks– Repeated checks can be removed by maintaining invariants

i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1

[Wikipedia’s Insertion Sort]

Can we turn the semantic checkinto a bounds check?

Page 268: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Designing away checks– Repeated checks can be removed by maintaining invariants

i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1

[Wikipedia’s Insertion Sort]

We just guarantee that A startswith the smallest element!

Page 269: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Designing away checks– Repeated checks can be removed by maintaining invariants

i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1

[Wikipedia’s Insertion Sort]

k find_smallest(A)← 1swap A[0] and A[k]i 1← 1while i < length(A) j i← 1 while A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1

We just guarantee that A startswith the smallest element!

Page 270: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Your Code

● Designing away checks– Repeated checks can be removed by maintaining invariants

i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1

[Wikipedia’s Insertion Sort]

A[-1] MIN_VALUE← 1i 1← 1while i < length(A) j i← 1 while A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1

We just guarantee that A startswith the smallest element!

Page 271: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware

Page 272: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware

● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.

Page 273: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware

● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.– Use general N logN sorting for N above 300 [Alexandrescu 2019]

Page 274: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware

● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.– Use general N logN sorting for N above 300 [Alexandrescu 2019]

● Caching & Precomputing

Page 275: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware

● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.– Use general N logN sorting for N above 300 [Alexandrescu 2019]

● Caching & Precomputing– If you will reuse results, save them and avoid recomputing

Page 276: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware

● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.– Use general N logN sorting for N above 300 [Alexandrescu 2019]

● Caching & Precomputing– If you will reuse results, save them and avoid recomputing– If all possible results are compact, just compute a table up front

Page 277: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

Page 278: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice

Page 279: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!

Page 280: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance

Page 281: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance

A uniform cost modelthrows necessary information away

Page 282: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance– We want modeling & algorithms that account for artifacts like:

memory, I/O, consistency & speculation, shapes of workloads

Page 283: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance– We want modeling & algorithms that account for artifacts like:

memory, I/O, consistency & speculation, shapes of workloads

● Alternative approaches

Page 284: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance– We want modeling & algorithms that account for artifacts like:

memory, I/O, consistency & speculation, shapes of workloads

● Alternative approaches– I/O complexity, I/O efficiency and cache awareness

Page 285: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance– We want modeling & algorithms that account for artifacts like:

memory, I/O, consistency & speculation, shapes of workloads

● Alternative approaches– I/O complexity, I/O efficiency and cache awareness

CPU

Memory 1

Memory 2

Block size B

Page 286: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance– We want modeling & algorithms that account for artifacts like:

memory, I/O, consistency & speculation, shapes of workloads

● Alternative approaches– I/O complexity, I/O efficiency and cache awareness

CPU

Memory 1

Memory 2

Block size B

Complexity measured in block transfers

Page 287: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance– We want modeling & algorithms that account for artifacts like:

memory, I/O, consistency & speculation, shapes of workloads

● Alternative approaches– I/O complexity, I/O efficiency and cache awareness– Cache oblivious algorithms & data structures

Page 288: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance– We want modeling & algorithms that account for artifacts like:

memory, I/O, consistency & speculation, shapes of workloads

● Alternative approaches– I/O complexity, I/O efficiency and cache awareness– Cache oblivious algorithms & data structures

Similar to I/O, but agnostic to block size

Page 289: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts

● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world

performance– We want modeling & algorithms that account for artifacts like:

memory, I/O, consistency & speculation, shapes of workloads

● Alternative approaches– I/O complexity, I/O efficiency and cache awareness– Cache oblivious algorithms & data structures– Parameterized complexity

Page 290: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Classic design mistakes [Lu 2012]

Page 291: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Classic design mistakes [Lu 2012]

– Uncoordinated functions (e.g. lack of batching)

for (auto& action : actions) { action.do()}

Action::do() { acquire(mutex) ... release(mutex)}

Page 292: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Classic design mistakes [Lu 2012]

– Uncoordinated functions (e.g. lack of batching)

acquire(mutex)for (auto& action : actions) { action.do()}release(mutex)

for (auto& action : actions) { action.do()}

Action::do() { acquire(mutex) ... release(mutex)}

vs

Page 293: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Classic design mistakes [Lu 2012]

– Uncoordinated functions (e.g. lack of batching)– Skippable functions (e.g. transparent draws)

Page 294: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Classic design mistakes [Lu 2012]

– Uncoordinated functions (e.g. lack of batching)– Skippable functions (e.g. transparent draws)– Poor/unclear synchronization

Page 295: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Classic design mistakes [Lu 2012]

– Uncoordinated functions (e.g. lack of batching)– Skippable functions (e.g. transparent draws)– Poor/unclear synchronization

foo() { bar()}

bar() { baz()}

baz() { quux()}

quux() { random()}

random() { acquire(mutex) ... release(mutex)}

Page 296: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Optimizing Algorithms

● Classic design mistakes [Lu 2012]

– Uncoordinated functions (e.g. lack of batching)– Skippable functions (e.g. transparent draws)– Poor/unclear synchronization– Poor data structure selection

Page 297: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary

● Reasoning rigorously about performance is challenging

Page 298: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary

● Reasoning rigorously about performance is challenging

● Good tooling can allow you to investigate performance well

Page 299: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing

Summary

● Reasoning rigorously about performance is challenging

● Good tooling can allow you to investigate performance well

● We can improve performance through– compilers– managing data– managing code– better algorithmic thinking