calculating stack distances efficiently george almasi,calin cascaval,david padua...

Calculating Stack Distances Efficiently

George Almasi,Calin Cascaval,David Padua

{galmasi,cascaval,padua}@cs.uiuc.edu

What this talk is, and is not, about

• This talk is about:– Algorithms to calculate

stack distance histograms

– Speed/memory optimization of trace analysis to create stack distance histogram

• This talk is not about:– why stack distance

histograms are/are not useful

– relative merits of inter-reference distance vs. stack distance

– speed/memory optimization of applications

Two measures of locality

• Inter-reference distance:– the number of other references between two

references to the same address in the trace

• Stack distance:– The number of distinct addresses referred

between two references to the same address

a b c d b c d e a Inter-ref distance = 7stack distance = 4

Stack Distances As Cache Misses

• compute the number of cache hits and misses as follows:

0

5000

10000

15000

20000

25000

1 3 5 7 Inf

Stack Distances

# R

efe

ren

ces

hits(C) = s()C

=1

misses(C) = s()Inf

=C+1

Inter-reference distance

• Given that at time t ref(t)=x

• find t0, time of last previous reference to x

• inter reference distance:

• Efficient implementation: a (hash)table H(x) = t0, the trace index of the last reference to x;

Memory usage ~ 2x original program

Cost O(1) per reference

})()(0|{max0 trefreftt

0)( tttdist

Stack distance

a

b

c

d

e

...

x

y

z

u

v

f

a

b

c

d

e

...

x

y

z

u

v

f

h

a

b

c

d

e

...

y

z

u

v

f

h

x

Depth(x)

a

b

c

d

e

...

y

z

u

v

f

h

x

z

u

v

...

a

c

d

e

f

b

h

x

1

y

3

Stack distance

• Simulates an infinite cache with LRU replacement policy

• nice properties (inclusion!)

• naïve implementation: stack as linked list/array– m = 250,000 average maximum stack depth– list traversal/array updates; O(m) per trace

element

Insight: stack is contained in trace

a b b g e d f z f c e b c d a

Time

gz f e b c d a

Trace

Stack

Time=t

Stack top

g

gg

Holes

• Index tx in the trace is a hole if ref(tx) has already been referenced again at a later time ty < t.

• Using holes, we can say– stackdist(t) = refdist(t) - #holes(t0 to t)

• How many holes are there between t0 and t?

An interval tree of holes

o o o ao o o...tt0

o • a

Prev. ref to a ref to ak:k k+4:k+5

• •

Single tree operation: count_and_add (t0)•Determines # of holes between t0 and t; adds a new hole at t0

•Adding a hole can create a new interval - or fuse two existing ones

k+2:k+3

Operations on the interval tree

k:n

Add to interval edge:count_and_add(p)p=n+1

k:n+1

Create new interval:count_and_add(p)p > n+1

k:n

Join two intervals:count_and_add(p)p = n+1

k:n+1

p:p

k:n

n+2:p

k:p

Pre-allocated hole trees

• basics:– tree is pre-allocated

– binary, balanced

– each node contains a number: the number of holes in its right subtree

– memory used by node depends on node’s depth

• a modified version of the B&K algorithm:

– holes instead of references

– binary instead of n-ary

– better memory usage

Pre-allocated hole trees

a b b g e d f z f c e b c d a

1 0 1 0

0

1 0 0 0

011

03

1

nn

count += n n=n+1

Many Questions

• Q: Why holes and not stack elements?

• A: Holes need 1/2 the maintenance of stack elements.

• Q: Will the interval tree grow to ?

• A: No. Intervals fuse together spontaneously.

• Q: How big will the tree be?

• A: #of intervals = O(stack depth)

• Depth of a tree of stack elements would be the same size

• Q: Will the tree be unbalanced?

• A: Yes, because it tends to grow on one side.

More questions

• Q: what kind of interval tree?

• A: RB and AVL

• Q: Which is better?

• A: AVL is better.

• Q: Why?

• A: – shorter average tree height:

h+1 vs. 2h

– not all operations change the tree structure

Comparisons

• Interval trees:

• exec time O(log(m))

• memory usage O(m)

• AVL better than RB

• pointer chasing, bad locality

• Pre-allocated trees:

• exec time O(log(n))

• memory usage O(n)– hits practical limit

• holes are better– reduced maintenance

• no pointer chasing, good locality

Results: hole interval trees

0

50

100

150

200

250

300

350

400

adm

arc2d

bdna

dyfesm o52

mdg

ocean

qcd

spec77

spice

track trfd

slo

wd

ow

n r

ela

tiv

e t

o o

rig

ina

l ap

p nul avl rb

Results: preallocated trees

0

20

40

60

80

100

120

140

160

180

200

adm

arc2d

bdna

dyfesm o52

mdg

ocean

qcd

spec77

spice

track

trfd

slo

wd

ow

n r

elat

ive

to o

rig

inal

ap

p

nul pre B&K

Conclusions

• Stack distances with holes:– using RB/AVL interval trees– using pre-allocated trees

• Using holes reduces linear overhead by 20-40% for both kinds of algorithms.

calculating stack distances efficiently george almasi,calin cascaval,david padua...

Documents