TM performance: seeing the whole picture
or
Looking back over the first 500 papers
Tim Harris (MSR Cambridge)
How might we compare TM systems?
Where might TM be most useful?
Extending Dan’s GC analogy
Concurrent GC algorithm
(run GC in small steps in
amongst mutators)
“Here’s a way to reduce the pause times...”
A
“Here’s a way to support pinned objects...”
B “Here’s a way to improve the throughput (total app
runtime)...
C
Min mutator utilization
0 2 4 6 8 10 120.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Algorithm AAlgorithm B
Time interval / ms
Min
facti
on o
f int
erva
l run
ning
mut
ator
Five dimensions to TM behaviorSequentialoverhead
Scalability(to longer
transactions)
Scalability(to more cores)
Tx-supportedoperations
Semantics
Scaling to large transactions
0 1 2 3 4 5 6 7 8 9 100.00.51.01.52.02.53.03.54.04.55.0
Algorithm AAlgorithm B
Tx size
Norm
alize
d ex
ecuti
on ti
me
1.0 = optimized sequential code(no tx, no locks)
Scaling: n*1-core copies
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
Algorithm AAlgorithm B
#cores
Norm
alize
d ex
ecuti
on ti
me
1.0 = optimized sequential code(no tx, no locks)
Scaling: 1*n-core copy
0 1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2
2.5
Algorithm AAlgorithm B
#cores
Spee
dup
over
sequ
entia
l
1.0 = optimized sequential code(no tx, no locks)
How might we compare TM systems?
Where might TM be most useful?
Application model #1
Sequential Parallelizable
f = fraction of original program that is parallelizable
Application model #1
Sequential
Parallel
Parallel
Parallel
...
f = fraction of original program that is parallelizablen = num parallel threads
Application model #1
Sequential
Parallel, transactional
Parallel, transactional
Parallel, transactional
...
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-down
Conflict model
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
1 2 3 4 5 6
Fixed number of alternatives, executedifferent alternatives in parallel
Execute conflictingoperations in series
n=16, c=1.0, vary f, vary x
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
n=16, c=1.0
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
8x on 16 threads => 95% parallelizable
n=16, c=1.0
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
Straight-line slow-down bites quickly
n=16, c=1.1 (1..1024)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
n=16, c=1.4 (1..256)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
n=16, c=2.0 (1..64)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
n=16, c=3.1 (1..16)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
If Amdahl and overheads don’t get
you then conflicts still can...
n=16, c=1.0, scaling of large tx
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
0.0 1.0 2.0 3.0 4.00.0
5.0
10.0
x*f
x*f
n=16, c=1.0, x*(f+(f^1.25)/4)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635722
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
0.0 1.0 2.0 3.0 4.00.0
5.0
10.0
x*f
x*(f+
(f^1.
25)/
4)
n=16, c=1.0, x*(f+(f^2)/4)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635722
5.55991731349224
75%78%80%83%85%88%90%93%95%98%100%
75%78%80%85%88%
x (straight-line transactional slow-down)
f (pa
ralle
l pro
porti
on)
f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
0.0 1.0 2.0 3.0 4.00.0
5.0
10.0
x*f
x*(f+
(f^2)
/4)
Application model #2: 100% parallel
Tx
...
t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
Non-tx
Tx Non-tx
Tx Non-tx
Workloads (ASPLOS ’10)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.559917313492240%10%20%30%40%50%60%70%80%90%100%
0%10%20%30%
x (straight-line transactional slow-down)
t (tr
ansa
ction
al p
ropo
rtion
)Labyrinth
Genome
JBBAtomicVacation
t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
MaxFlow
Workloads (ASPLOS ’10)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.559917313492240%10%20%30%40%50%60%70%80%90%100%
0%10%20%30%
x (straight-line transactional slow-down)
t (tr
ansa
ction
al p
ropo
rtion
)
t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
Labyrinth
Genome
JBBAtomicVacation
MaxFlow
n=16, c=1.0 (no conflicts)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.559917313492240%10%20%30%40%50%60%70%80%90%100%
0%10%20%40%
x (straight-line transactional slow-down)
t (tr
ansa
ction
al p
ropo
rtion
)
t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
n=16, c=1.0 (no conflicts)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.559917313492240%10%20%30%40%50%60%70%80%90%100%
0%10%20%40%
x (straight-line transactional slow-down)
t (tr
ansa
ction
al p
ropo
rtion
)
t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
Overheads rapidly reduce the amount
that transactions can be used
n=16, c=1.1 (1..1024)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.559917313492240%10%20%30%40%50%60%70%80%90%100%
0%10%20%40%
x (straight-line transactional slow-down)
t (tr
ansa
ction
al p
ropo
rtion
)
t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
n=16, c=1.4 (1..256)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635721
5.559917313492240%10%20%30%40%50%60%70%80%90%100%
0%10%20%40%
x (straight-line transactional slow-down)
t (tr
ansa
ction
al p
ropo
rtion
)
t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
n=16, c=2.0 (1..64)
11.21
1.4641
1.771561
2.14358881
2.5937424601
3.138428376721
3.79749833583242
4.59497298635722
5.559917313492240%10%20%30%40%50%60%70%80%90%100%
0%10%20%40%
x (straight-line transactional slow-down)
t (tr
ansa
ction
al p
ropo
rtion
)
t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)
Conclusions• Bad things come in threes...
– Amdahl’s law– Sequential overhead– Conflicts
• When developing TM systems we need to be careful about tradeoffs between these
• There’s a risk of “chasing around the TM design space”– Sequential overhead– Scaling without conflicts– Scaling with conflicts