Transcript
Page 1: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

TM performance: seeing the whole picture

or

Looking back over the first 500 papers

Tim Harris (MSR Cambridge)

Page 2: TM performance: seeing  the whole picture or Looking back over  the first 500 papers
Page 3: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

How might we compare TM systems?

Where might TM be most useful?

Page 4: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Extending Dan’s GC analogy

Concurrent GC algorithm

(run GC in small steps in

amongst mutators)

“Here’s a way to reduce the pause times...”

A

“Here’s a way to support pinned objects...”

B “Here’s a way to improve the throughput (total app

runtime)...

C

Page 5: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Min mutator utilization

0 2 4 6 8 10 120.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Algorithm AAlgorithm B

Time interval / ms

Min

facti

on o

f int

erva

l run

ning

mut

ator

Page 6: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Five dimensions to TM behaviorSequentialoverhead

Scalability(to longer

transactions)

Scalability(to more cores)

Tx-supportedoperations

Semantics

Page 7: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Scaling to large transactions

0 1 2 3 4 5 6 7 8 9 100.00.51.01.52.02.53.03.54.04.55.0

Algorithm AAlgorithm B

Tx size

Norm

alize

d ex

ecuti

on ti

me

1.0 = optimized sequential code(no tx, no locks)

Page 8: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Scaling: n*1-core copies

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

Algorithm AAlgorithm B

#cores

Norm

alize

d ex

ecuti

on ti

me

1.0 = optimized sequential code(no tx, no locks)

Page 9: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Scaling: 1*n-core copy

0 1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

Algorithm AAlgorithm B

#cores

Spee

dup

over

sequ

entia

l

1.0 = optimized sequential code(no tx, no locks)

Page 10: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

How might we compare TM systems?

Where might TM be most useful?

Page 11: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Application model #1

Sequential Parallelizable

f = fraction of original program that is parallelizable

Page 12: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Application model #1

Sequential

Parallel

Parallel

Parallel

...

f = fraction of original program that is parallelizablen = num parallel threads

Page 13: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Application model #1

Sequential

Parallel, transactional

Parallel, transactional

Parallel, transactional

...

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-down

Page 14: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Conflict model

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

1 2 3 4 5 6

Fixed number of alternatives, executedifferent alternatives in parallel

Execute conflictingoperations in series

Page 15: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0, vary f, vary x

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 16: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

8x on 16 threads => 95% parallelizable

Page 17: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Straight-line slow-down bites quickly

Page 18: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.1 (1..1024)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 19: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.4 (1..256)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 20: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=2.0 (1..64)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 21: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=3.1 (1..16)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

If Amdahl and overheads don’t get

you then conflicts still can...

Page 22: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0, scaling of large tx

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

0.0 1.0 2.0 3.0 4.00.0

5.0

10.0

x*f

x*f

Page 23: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0, x*(f+(f^1.25)/4)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635722

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

0.0 1.0 2.0 3.0 4.00.0

5.0

10.0

x*f

x*(f+

(f^1.

25)/

4)

Page 24: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0, x*(f+(f^2)/4)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635722

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

0.0 1.0 2.0 3.0 4.00.0

5.0

10.0

x*f

x*(f+

(f^2)

/4)

Page 25: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Application model #2: 100% parallel

Tx

...

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Non-tx

Tx Non-tx

Tx Non-tx

Page 26: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Workloads (ASPLOS ’10)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%30%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)Labyrinth

Genome

JBBAtomicVacation

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

MaxFlow

Page 27: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Workloads (ASPLOS ’10)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%30%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Labyrinth

Genome

JBBAtomicVacation

MaxFlow

Page 28: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0 (no conflicts)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 29: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0 (no conflicts)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Overheads rapidly reduce the amount

that transactions can be used

Page 30: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.1 (1..1024)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 31: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.4 (1..256)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 32: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=2.0 (1..64)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635722

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 33: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Conclusions• Bad things come in threes...

– Amdahl’s law– Sequential overhead– Conflicts

• When developing TM systems we need to be careful about tradeoffs between these

• There’s a risk of “chasing around the TM design space”– Sequential overhead– Scaling without conflicts– Scaling with conflicts


Top Related