calculating stack distances efficiently george almasi,calin cascaval,david padua...
TRANSCRIPT
Calculating Stack Distances Efficiently
George Almasi,Calin Cascaval,David Padua
{galmasi,cascaval,padua}@cs.uiuc.edu
What this talk is, and is not, about
• This talk is about:– Algorithms to calculate
stack distance histograms
– Speed/memory optimization of trace analysis to create stack distance histogram
• This talk is not about:– why stack distance
histograms are/are not useful
– relative merits of inter-reference distance vs. stack distance
– speed/memory optimization of applications
Two measures of locality
• Inter-reference distance:– the number of other references between two
references to the same address in the trace
• Stack distance:– The number of distinct addresses referred
between two references to the same address
a b c d b c d e a Inter-ref distance = 7stack distance = 4
Stack Distances As Cache Misses
• compute the number of cache hits and misses as follows:
0
5000
10000
15000
20000
25000
1 3 5 7 Inf
Stack Distances
# R
efe
ren
ces
hits(C) = s()C
=1
misses(C) = s()Inf
=C+1
Inter-reference distance
• Given that at time t ref(t)=x
• find t0, time of last previous reference to x
• inter reference distance:
• Efficient implementation: a (hash)table H(x) = t0, the trace index of the last reference to x;
Memory usage ~ 2x original program
Cost O(1) per reference
})()(0|{max0 trefreftt
0)( tttdist
Stack distance
a
b
c
d
e
...
x
y
z
u
v
f
a
b
c
d
e
...
x
y
z
u
v
f
h
a
b
c
d
e
...
y
z
u
v
f
h
x
Depth(x)
a
b
c
d
e
...
y
z
u
v
f
h
x
z
u
v
...
a
c
d
e
f
b
h
x
1
y
3
Stack distance
• Simulates an infinite cache with LRU replacement policy
• nice properties (inclusion!)
• naïve implementation: stack as linked list/array– m = 250,000 average maximum stack depth– list traversal/array updates; O(m) per trace
element
Insight: stack is contained in trace
a b b g e d f z f c e b c d a
Time
gz f e b c d a
Trace
Stack
Time=t
Stack top
g
gg
Holes
• Index tx in the trace is a hole if ref(tx) has already been referenced again at a later time ty < t.
• Using holes, we can say– stackdist(t) = refdist(t) - #holes(t0 to t)
• How many holes are there between t0 and t?
An interval tree of holes
o o o ao o o...tt0
o • a
Prev. ref to a ref to ak:k k+4:k+5
• •
Single tree operation: count_and_add (t0)•Determines # of holes between t0 and t; adds a new hole at t0
•Adding a hole can create a new interval - or fuse two existing ones
k+2:k+3
Operations on the interval tree
k:n
Add to interval edge:count_and_add(p)p=n+1
k:n+1
Create new interval:count_and_add(p)p > n+1
k:n
Join two intervals:count_and_add(p)p = n+1
k:n+1
p:p
k:n
n+2:p
k:p
Pre-allocated hole trees
• basics:– tree is pre-allocated
– binary, balanced
– each node contains a number: the number of holes in its right subtree
– memory used by node depends on node’s depth
• a modified version of the B&K algorithm:
– holes instead of references
– binary instead of n-ary
– better memory usage
Pre-allocated hole trees
a b b g e d f z f c e b c d a
1 0 1 0
0
1 0 0 0
011
03
1
nn
count += n n=n+1
Many Questions
• Q: Why holes and not stack elements?
• A: Holes need 1/2 the maintenance of stack elements.
• Q: Will the interval tree grow to ?
• A: No. Intervals fuse together spontaneously.
• Q: How big will the tree be?
• A: #of intervals = O(stack depth)
• Depth of a tree of stack elements would be the same size
• Q: Will the tree be unbalanced?
• A: Yes, because it tends to grow on one side.
More questions
• Q: what kind of interval tree?
• A: RB and AVL
• Q: Which is better?
• A: AVL is better.
• Q: Why?
• A: – shorter average tree height:
h+1 vs. 2h
– not all operations change the tree structure
Comparisons
• Interval trees:
• exec time O(log(m))
• memory usage O(m)
• AVL better than RB
• pointer chasing, bad locality
• Pre-allocated trees:
• exec time O(log(n))
• memory usage O(n)– hits practical limit
• holes are better– reduced maintenance
• no pointer chasing, good locality
Results: hole interval trees
0
50
100
150
200
250
300
350
400
adm
arc2d
bdna
dyfesm o52
mdg
ocean
qcd
spec77
spice
track trfd
slo
wd
ow
n r
ela
tiv
e t
o o
rig
ina
l ap
p nul avl rb
Results: preallocated trees
0
20
40
60
80
100
120
140
160
180
200
adm
arc2d
bdna
dyfesm o52
mdg
ocean
qcd
spec77
spice
track
trfd
slo
wd
ow
n r
elat
ive
to o
rig
inal
ap
p
nul pre B&K
Conclusions
• Stack distances with holes:– using RB/AVL interval trees– using pre-allocated trees
• Using holes reduces linear overhead by 20-40% for both kinds of algorithms.