measuring and modeling hyper-threaded processor performance ethan bolker umass-boston september 17,...

23
Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Upload: aspen-ground

Post on 16-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Measuring and Modeling Hyper-threaded Processor

Performance

Ethan Bolker

UMass-Boston

September 17, 2003

Page 2: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

• Joint work with Yiping Ding, Arjun Kumar (BMC Software)

• Accepted for presentation at CMG32, December 2003

• Paper (with references) available on request

Page 3: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Improving Processor Performance

• Speed up clock

• Invent revolutionary new architecture

• Replicate processors (parallel application)

• Remove bottlenecks (use idle ALU)– caches– pipelining– prefetch

Page 4: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Hyper-threading Technology (HTT)

Default for new Intel high end chips

• One ALU

• Duplicate state of computation (registers) to create two logical processors (chip size *= 1.05)

• Parallel instruction preparation (decode)

• ALU should see ready work more often

(provided there are two active threads)

Page 5: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

The path to instruction execution

Intel Technology Journal, Volume 06 Issue 01, February 14, 2002, p8

Page 6: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

How little must we understand?

• Batch workload: repeated dispatch of identical compute intensive jobs– vary number of threads– measure throughput (jobs/second)

• Treat processor as a black box

• Experiment to observe behavior

• Model to predict behavior

Page 7: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Batch throughput

0

100

200

300

400

500

600

700

800

900

1000

1 2 3 4 5 6 7 8

number of threads

thro

ug

hp

ut

one CPU, HTT off one CPU, HTT on

tw o CPUs, HTT off tw o CPUs, HTT on

} make sense

} puzzling

} makes sense

Page 8: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Transaction processing

• More interesting than batch

• Random size jobs arrive at random times• M/M/1

M = “Markov”

M/*/*: arrival stream is Poisson, rate */M/*: job size exponentially distributed, mean s

*/*/1: single processor

Page 9: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

M/M/1 model evaluation

• Utilization: U = s U is dimensionless: jobs/sec * sec/job

U < 1 else saturation

• Response time: r = s/(1-U) randomness each job sees (virtual)

processor slowed down (by other jobs) by factor 1/(1-U), so to accumulate s seconds of real work takes r = s/(1-U) seconds of real time

Page 10: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Benchmark

• Java driver – chooses interarrival times and service times

from exponential distributions, – dispatches each job in its own thread,– records actual job CPU usage, response time

• Input parameters– job arrival rate – mean job service time s

• Fix s = 1 second, vary (hence U), track r

Page 11: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Benchmark validation

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 0.2 0.4 0.6 0.8 1

utilization

resp

on

se t

ime

measured predicted measured/predicted

practice: measured

theory: M/M/1

R = 1/(1-U)

Page 12: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Theory vs practice

• “In theory, there is no difference between theory and practice. In practice, there is no relationship between theory and practice.” Grant Gainey

• “The gap between theory and practice in practice is much larger than the gap between theory and practice in theory.” Jeff Case

Page 13: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Explain/remove discrepancy

• Examine, tune benchmark driver

• Compute actual coefficients of variation, incorporate in corrected M/M/1 formula

• Nothing helps

• Postpone worry – in the meanwhile …

Page 14: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

HTT on vs HTT off

• Use this benchmark to measure the effect of hyper-threading on response time

• Use throughput () as the independent variable

• “Utilization” is ambiguous (digression)

Page 15: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

HTT on vs HTT off

0

0.5

1

1.5

2

2.5

3

3.5

4

0 0.2 0.4 0.6 0.8 1

Throughput

Res

po

nse

tim

e (r

atio

)

htt on htt off on/off

Page 16: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

What’s happening

• Hyper-threading allows more of the application parallelism to make its way to the ALU

• Can we understand this quantitatively?

Page 17: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Model HTT architecture

/2

/2

s1 s2

r = + 1 – (/2) s1 1 – s2

preparatory phase

service time s1

execution phase

service time s2

Page 18: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Theory vs practice

0

0.5

1

1.5

2

2.5

3

3.5

0 0.2 0.4 0.6 0.8 1

throughput

resp

on

se t

ime

measured predicted

measured/predicted

s1 = 0.13

s2 = 0.81

Page 19: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Model parameters• To compute response time r from model, need

(virtual) service parameters s1, s2 ( is known)

• Finding s1, s2

– eyeball measured data– fit two data points– maximum likelihood– derive from first principles

• s1 = 0.13, s2 = 0.81 make sense 15% of work is preparatory, 85% execution

Page 20: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Benchmark validation (reprise)

• Chip hardware unchanged when HTT off

• Assume one path used

• Tandem queue

• Parameter estimation as before

0

Page 21: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Theory vs practice

0

0.5

1

1.5

2

2.5

3

3.5

0 0.2 0.4 0.6 0.8 1

Throughput

Res

po

nse

tim

e (r

atio

)

measured predicted

measured/predicted

s1 = 0.045

s2 = 0.878

Page 22: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Future work

• Do serious statistics• Does 1+1 tandem queue model predict hyper-

threading response as well as complex 2+1 model?• Understand two-processor machine puzzle

• Explore how s1 and s2 vary with application

(e.g. fixed vs floating point)

• Find ways to estimate s1 and s2 from first

principles

Page 23: Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

Summary

• Hyper-threading is …

• Abstraction (modelling) leverages information: you can often understand a lot even when you know very little

• r = s/(1-U) is worth remembering

• You do need to connect theory and practice – and practice is harder than theory

• Questions?