generational stack collection and profile driven pretenuring perry cheng robert harper peter lee...

39
Generational Stack Collection Generational Stack Collection And Profile driven Pretenuring And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch ([email protected])

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Generational Stack Collection And Profile Generational Stack Collection And Profile

driven Pretenuringdriven Pretenuring

Perry Cheng Robert Harper

Peter Lee

Presented By Moti Alperovitch

([email protected])

Page 2: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

The problem

• Some data die young, and some data die old.

• In recursions, most deep stack unwind very infrequently.

• Scanning unchanged roots may take a dominant time.

Page 3: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

We compare the following types

• Semispace stack collection (Cheney).

• Generational collector.

• General Collection with stack marker.

• Pretenuring with Stack marker.

Page 4: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Semispace copy collection

• Scanning the Stack for roots, and copy data that reachable from the roots to unused areas (Nursery, Survive).

• Disadvantage:– all data is copied, when some data die young,

and some die old.

Page 5: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Generational collection

• Base on semispace copy collection.

• Arrange some heap areas according to the objects life time.

• Disadvantage:– For programs with deep call chain, The stack

scanning can take a lot of time.– Long time object are typically copied several

times before they are tenured.

Page 6: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

General stack collection

• Use stack marker in order to cache the root scan.

• Disadvantage:– Long time object are typically copied several

times before they are tenured

Page 7: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Pretenuring

• Making a run, in order to build profiles for each object life time according to it’s allocation site.

Page 8: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

TIL Compiler

• Optimization compiler for ML (SML).

• Intentional polymorphism.

• Nearly Tag free garbage collection.

• Conventional functional language optimization.

• Loop Optimization.

Page 9: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Stack Scanning

• At any execution point, data is live if it is accessed as the program continue to execute.

• The collector need to retain data that is accessible by following the all pointers roots.

• The roots are registers and stack slots.

Page 10: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Difficulties

• Accurate determine the root set.

• In callee-save registers, the content of a register or stack slot can come from caller frames so stack frames cannot be decoded in isolation.

• In Polymorphism the compiler cannot statically compute whether a value is a pointer of not.

Page 11: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Finding the root

• When the GC is called from mutator, the return address indicate the current execution point (Return Address).

• By the RA (Using a table), we can determine the frame layout of the GC - caller frame.

• By continuing this way, we can find the root.

Page 12: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Finding the roots

• Determine the roots set from the initial frame, By scanning downwards.

• The two ways scanning is needed since there are stack slots that their type depend on the previous stack slot.

Page 13: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Trace table information

• The Return address (RA).

• Stack frame size.

• For each stack-slot we record its trace:– Pointer: The compiler statically determine that

it’s a pointer.– Non Pointer - The value is not a root.– Calee-save + (Register) - Calle-save

information.

Page 14: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Trace table information - 2

– Compute: Compiler couldn’t statically determine the pointer status of a value. Have an additional information to determine where the type of such value reside.

Page 15: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Stack frames and the corresponding table entry.

RA=0x2001c71842

Slot 1Slot 2Slot 3Slot 4Slot 5Slot 6

55 56

77 78 79

INTINTINT

3.1415

Stack Frame

RA=0x2001c718

Frame size = 6

Non Pointer

Pointer

Pointer

Compute: Stack 4

Entry 1Entry 2Entry 3Entry 4Entry 5Entry 6Entry 7Compute: Calle $10

…Trace info on Register

Table Entry

Page 16: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Semispace against Generations collections

Time for K = 1.5

01020304050

60708090

100

CheckSum Color FFT Grobner KnuthPending

Lexgend Life Peg PIA Simplae

Program Name

ms

SemiSpaceGenerational

Page 17: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

SemiSpace against Generations collections

Time for K = 4

0

10

20

30

40

50

60

CheckSum Color FFT Grobner KnuthPending

Lexgend Life Peg PIA Simplae

Program Name

ms

SemiSpaceGenerational

Page 18: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

SemiSpace against Generations collectionsNumber of GC for K = 1.5

05000

100001500020000250003000035000

Check

SumColo

rFFT

Grobn

er

Knuth

Pen

ding

Lexge

nd Life Peg PIA

Simpla

e

Program Name

Number

SemiSpaceGenerational

Page 19: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Semispace against Generations collectionsNumber of GC for K = 4

02000400060008000

1000012000

Check

SumColo

rFFT

Grobn

er

Knuth

Pen

ding

Lexge

nd Life Peg PIA

Simpla

e

Program Name

Number

SemiSpaceGenerational

Page 20: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Stack marking

• When the stack is deep, scanning the root may take a dominant time of the GC time.

• Most of the stack usually doesn’t change from the previous GC, to the current GC.

• Marking the stack frames that didn’t changed, can significant improve the roots scanning.

Page 21: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Marking the stack - 1st method

• On each stack frame, add a flag whether it was changed. The collector reset this flag when passing it, while the mutator set this flag.

• Disadvantage:– The mutator is involved in the GC process.

– The compiler need to do several operations for the GC, on each return, while most time the GC is not used.

Page 22: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Marking the stack - 2nd method

• When scanning the roots, set the RA of every n stack frame to a special stub function.

• The stub function hold a table of the RA.

• The stub function notes that this frame was deactivate, and continue to the original RA.

Page 23: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Marking the stack - Method 2

• The Problems with this method:– Functions doesn’t always return normally.– When exception is raised, It’s invoked in stack

order until there is a matching handler.– Fortunately, we can hold a value of M that

updated on exceptions that is contains the shallowest stack pointer that occurred as a result of raised exception.

Page 24: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Stack Marker improvement

-100

1020304050607080

%

Che

ckSu

m

Col

or

FF

T

Gro

bner KB

Lex

gen

Lif

e

Nqu

een

Peg

PIA

Sam

ple

Page 25: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Pretenuring

• Using profile data to predict the survival rate of an object.

• We speculate that object allocated from the same place in program would have to be similar lifetime.

• In order to check this hypothesis we divide the program to some heap allocations site.

Page 26: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Pretenuring - 2

• The compiler is modified in order to update a table of allocation sites when creating.

• During garbage collection the entries are updated.

• We scan allocation area after each collection to located death object and update their allocation site.

Page 27: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Pretenuring - 3

• Using this information we can create statistics about the number, size and average age of object created from each allocation site.

• We include only allocation sites that included at least 1% of the allocations, or 1% of the copied data.

Page 28: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

The profile results

Page 29: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

The profile results

Page 30: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

The results

• According to the results we can see that 90% of the allocation have very short life time, but 96 - 99 % of the copied date are generated from 4 sites.

Page 31: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Using the profile data

• Object that created from allocated site that have long life time, directly created into the older generation.

• Problem: An object directly allocated in the older generation may have a reference to an object in the younger generation.

Page 32: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Solutions ?

• Allocating that type of object in the young generation.– May lead to a lot more copying.

• Remember the area of the older generation that have reference to the young reference, and scan it on each minor generation.– Scanning without copying doesn’t take a lot of

time.

Page 33: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Improvement of pretenuring (ms)

Generational collection Generational collection withpretenuring

ProgramK=1.5 K=2.0 K=4.0 K=1.5 K=2.0 K=4.0

%Improve

Knuth-Bandix 7.66 8.00 8.07 1.44 1.76 1.88 33

Lexgebnd 3.20 2.58 2.43 2.63 2.00 1.55 27

Nqueen 1.83 1.86 1.95 13.88 14.03 13.53 50

Simple 5.05 4.81 4.33 3.58 3.74 3.71 12

Page 34: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Improvement of pretenuring (bytes copy)

Generational collection Generational collection withpretenuring

ProgramK=1.5 K=2.0 K=4.0 K=1.5 K=2.0 K=4.0

%Improve

Knuth-Bandix

14,569,800

17.869,436

17,695,560

2,050,212

5,376,156

5,151,708 70

Lexgebnd27,427,5

4418,647,6

3216,435,2

9224,278,3

8815,452,6

9613,397

,340 18

Nqueen5,312,54

85,312,548

5,312,548

194,256 194,256194,256 96

Simple25,771,3

4825,431,1

4425,430,248

14,241,500

14,734,176

14,133,376 44

Page 35: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Comparing between all the methods

0

20

40

60

80

100

120C

olor

Gro

bner KB

Lex

gen

Lif

e

Nqu

een

PIA

Sim

ple

Generational Stack Markers Pretenuring with stack Marker

Page 36: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Conclusion for pretenuring

• The reduction of GC time is smaller that excepted from the reduction of data copied.

• Since we have to check the younger generations, the cost of GC time is still proportional to the live data (With a smaller constant).

Page 37: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Suggestion to improve the speed

• Creating a control-flow and data-flow analysis on objects.

Page 38: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

Conclusions

• Generational collector is twice faster on GC time. And also improve the GC time, since it’s improve the cache locality.

• For programs that use deep stack, caching the roots data can improve GC time up to 74%.

• Profiling the heap can improve the speed for some cases by 50%.

Page 39: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)

The End