generational stack collection and profile driven pretenuring perry cheng robert harper peter lee...
Post on 20-Dec-2015
217 views
TRANSCRIPT
Generational Stack Collection And Profile Generational Stack Collection And Profile
driven Pretenuringdriven Pretenuring
Perry Cheng Robert Harper
Peter Lee
Presented By Moti Alperovitch
The problem
• Some data die young, and some data die old.
• In recursions, most deep stack unwind very infrequently.
• Scanning unchanged roots may take a dominant time.
We compare the following types
• Semispace stack collection (Cheney).
• Generational collector.
• General Collection with stack marker.
• Pretenuring with Stack marker.
Semispace copy collection
• Scanning the Stack for roots, and copy data that reachable from the roots to unused areas (Nursery, Survive).
• Disadvantage:– all data is copied, when some data die young,
and some die old.
Generational collection
• Base on semispace copy collection.
• Arrange some heap areas according to the objects life time.
• Disadvantage:– For programs with deep call chain, The stack
scanning can take a lot of time.– Long time object are typically copied several
times before they are tenured.
General stack collection
• Use stack marker in order to cache the root scan.
• Disadvantage:– Long time object are typically copied several
times before they are tenured
Pretenuring
• Making a run, in order to build profiles for each object life time according to it’s allocation site.
TIL Compiler
• Optimization compiler for ML (SML).
• Intentional polymorphism.
• Nearly Tag free garbage collection.
• Conventional functional language optimization.
• Loop Optimization.
Stack Scanning
• At any execution point, data is live if it is accessed as the program continue to execute.
• The collector need to retain data that is accessible by following the all pointers roots.
• The roots are registers and stack slots.
Difficulties
• Accurate determine the root set.
• In callee-save registers, the content of a register or stack slot can come from caller frames so stack frames cannot be decoded in isolation.
• In Polymorphism the compiler cannot statically compute whether a value is a pointer of not.
Finding the root
• When the GC is called from mutator, the return address indicate the current execution point (Return Address).
• By the RA (Using a table), we can determine the frame layout of the GC - caller frame.
• By continuing this way, we can find the root.
Finding the roots
• Determine the roots set from the initial frame, By scanning downwards.
• The two ways scanning is needed since there are stack slots that their type depend on the previous stack slot.
Trace table information
• The Return address (RA).
• Stack frame size.
• For each stack-slot we record its trace:– Pointer: The compiler statically determine that
it’s a pointer.– Non Pointer - The value is not a root.– Calee-save + (Register) - Calle-save
information.
Trace table information - 2
– Compute: Compiler couldn’t statically determine the pointer status of a value. Have an additional information to determine where the type of such value reside.
Stack frames and the corresponding table entry.
RA=0x2001c71842
Slot 1Slot 2Slot 3Slot 4Slot 5Slot 6
55 56
77 78 79
INTINTINT
3.1415
Stack Frame
RA=0x2001c718
Frame size = 6
Non Pointer
Pointer
Pointer
Compute: Stack 4
Entry 1Entry 2Entry 3Entry 4Entry 5Entry 6Entry 7Compute: Calle $10
…Trace info on Register
Table Entry
Semispace against Generations collections
Time for K = 1.5
01020304050
60708090
100
CheckSum Color FFT Grobner KnuthPending
Lexgend Life Peg PIA Simplae
Program Name
ms
SemiSpaceGenerational
SemiSpace against Generations collections
Time for K = 4
0
10
20
30
40
50
60
CheckSum Color FFT Grobner KnuthPending
Lexgend Life Peg PIA Simplae
Program Name
ms
SemiSpaceGenerational
SemiSpace against Generations collectionsNumber of GC for K = 1.5
05000
100001500020000250003000035000
Check
SumColo
rFFT
Grobn
er
Knuth
Pen
ding
Lexge
nd Life Peg PIA
Simpla
e
Program Name
Number
SemiSpaceGenerational
Semispace against Generations collectionsNumber of GC for K = 4
02000400060008000
1000012000
Check
SumColo
rFFT
Grobn
er
Knuth
Pen
ding
Lexge
nd Life Peg PIA
Simpla
e
Program Name
Number
SemiSpaceGenerational
Stack marking
• When the stack is deep, scanning the root may take a dominant time of the GC time.
• Most of the stack usually doesn’t change from the previous GC, to the current GC.
• Marking the stack frames that didn’t changed, can significant improve the roots scanning.
Marking the stack - 1st method
• On each stack frame, add a flag whether it was changed. The collector reset this flag when passing it, while the mutator set this flag.
• Disadvantage:– The mutator is involved in the GC process.
– The compiler need to do several operations for the GC, on each return, while most time the GC is not used.
Marking the stack - 2nd method
• When scanning the roots, set the RA of every n stack frame to a special stub function.
• The stub function hold a table of the RA.
• The stub function notes that this frame was deactivate, and continue to the original RA.
Marking the stack - Method 2
• The Problems with this method:– Functions doesn’t always return normally.– When exception is raised, It’s invoked in stack
order until there is a matching handler.– Fortunately, we can hold a value of M that
updated on exceptions that is contains the shallowest stack pointer that occurred as a result of raised exception.
Stack Marker improvement
-100
1020304050607080
%
Che
ckSu
m
Col
or
FF
T
Gro
bner KB
Lex
gen
Lif
e
Nqu
een
Peg
PIA
Sam
ple
Pretenuring
• Using profile data to predict the survival rate of an object.
• We speculate that object allocated from the same place in program would have to be similar lifetime.
• In order to check this hypothesis we divide the program to some heap allocations site.
Pretenuring - 2
• The compiler is modified in order to update a table of allocation sites when creating.
• During garbage collection the entries are updated.
• We scan allocation area after each collection to located death object and update their allocation site.
Pretenuring - 3
• Using this information we can create statistics about the number, size and average age of object created from each allocation site.
• We include only allocation sites that included at least 1% of the allocations, or 1% of the copied data.
The profile results
The profile results
The results
• According to the results we can see that 90% of the allocation have very short life time, but 96 - 99 % of the copied date are generated from 4 sites.
Using the profile data
• Object that created from allocated site that have long life time, directly created into the older generation.
• Problem: An object directly allocated in the older generation may have a reference to an object in the younger generation.
Solutions ?
• Allocating that type of object in the young generation.– May lead to a lot more copying.
• Remember the area of the older generation that have reference to the young reference, and scan it on each minor generation.– Scanning without copying doesn’t take a lot of
time.
Improvement of pretenuring (ms)
Generational collection Generational collection withpretenuring
ProgramK=1.5 K=2.0 K=4.0 K=1.5 K=2.0 K=4.0
%Improve
Knuth-Bandix 7.66 8.00 8.07 1.44 1.76 1.88 33
Lexgebnd 3.20 2.58 2.43 2.63 2.00 1.55 27
Nqueen 1.83 1.86 1.95 13.88 14.03 13.53 50
Simple 5.05 4.81 4.33 3.58 3.74 3.71 12
Improvement of pretenuring (bytes copy)
Generational collection Generational collection withpretenuring
ProgramK=1.5 K=2.0 K=4.0 K=1.5 K=2.0 K=4.0
%Improve
Knuth-Bandix
14,569,800
17.869,436
17,695,560
2,050,212
5,376,156
5,151,708 70
Lexgebnd27,427,5
4418,647,6
3216,435,2
9224,278,3
8815,452,6
9613,397
,340 18
Nqueen5,312,54
85,312,548
5,312,548
194,256 194,256194,256 96
Simple25,771,3
4825,431,1
4425,430,248
14,241,500
14,734,176
14,133,376 44
Comparing between all the methods
0
20
40
60
80
100
120C
olor
Gro
bner KB
Lex
gen
Lif
e
Nqu
een
PIA
Sim
ple
Generational Stack Markers Pretenuring with stack Marker
Conclusion for pretenuring
• The reduction of GC time is smaller that excepted from the reduction of data copied.
• Since we have to check the younger generations, the cost of GC time is still proportional to the live data (With a smaller constant).
Suggestion to improve the speed
• Creating a control-flow and data-flow analysis on objects.
Conclusions
• Generational collector is twice faster on GC time. And also improve the GC time, since it’s improve the cache locality.
• For programs that use deep stack, caching the roots data can improve GC time up to 74%.
• Profiling the heap can improve the speed for some cases by 50%.
The End