rethinking garbage collection
TRANSCRIPT
Rethinking Garbage Collection
Rifat Shahriyar
Who Am I? • Asst. Prof., Dept. of CSE, BUET • PhD from Australian National University
(ANU) • Dissertation title – High Performance Reference Counting and
Conservative Garbage Collection • Supervised by – Steve Blackburn (ANU) – Kathryn McKinley (MSR)
2
Java Virtual Machine
3
The Birth of GC (1960)
4
Today Garbage collection is ubiquitous
• GC algorithms – Tracing and Reference counting
• GC implementations – Exact and Conservative
✔ Tracing and Exact in all highly engineered, high performance systems ✘ Reference counting and conservative only
in non-performance critical settings
5
GC Fundamentals Algorithmic Components
Allocation Reclamation
6
Identification
Bump Allocation
Free List
`
Tracing (implicit)
Reference Counting (explicit)
Sweep-to-Free
Compact
Evacuate
3 1
Mark-Compact [Styger 1967] Bump allocation + trace + compact
GC Fundamentals Canonical Garbage Collectors
7
`
Sweep-to-Free
Compact
Evacuate
Mark-Sweep [McCarthy 1960] Free-list + trace + sweep-to-free
Semi-Space [Cheney 1970] Bump allocation + trace + evacuate
Generational [Ungar 1984]
• Most objects die young
8
Nursery space
Mature space
Immix [Blackburn and McKinley 2008]
0
• Contiguous allocation into regions – 256B lines and 32KB blocks – Objects span lines but not blocks
• Simple mark phase – Mark objects and containing regions
• Free unmarked regions • Recycled allocation and defragmentation
9
block
line
recyclable lines object mark line mark
Down for the Count? Getting Reference Counting
Back in the Ring ISMM’12
10
Tracing [McCarthy1960]
11
A
D
B
E
C
F ✗ ✗
Roots
Reference Counting [Collins 1960]
12
1
1
1
1 2
1 2
0
1
✗
Roots
Why Reference Counting?
Advantages ✔ Immediacy ✔ Object local ✔ Basic RC is easy Disadvantages ✘ Cycles ✘ Performance
13
Problem
✔ One of the two fundamental GC algorithms ✔ Many advantages ✘ Neglected by performance-conscious VMs So how much slower is it?
Can we get RC back in the ring?
14
30%
Optimizing RC
• Limited bit count – Use just few bits, fix o/f with backup tracing
• Elision of new object counts – Only do RC work if object survives the first GC
• Born as dead – Avoid free-list work for short lived objects
15
16
New RC ≈ MS
Optimized RC vs. MS
-20% 0%
20% 40% 60% 80%
co
mp
ress
jess
db
java
c
mtr
t
jack
avro
ra
blo
at
eclip
se
fop
hsq
ldb
luin
de
x
pm
d
su
nflo
w
xa
lan
pjb
b2
00
5
ge
om
ea
n
faste
r ←
T
ime
→
slo
we
r
Old RC New RC
Summary • Old RC – 30% slower than MS – 40% slower than production
• New RC – Limited bit count – Optimization for new objects
• Performance – Matches MS – Still 10% slower than production
17
40%
10%
Tota
l Tim
e v
Pro
du
ctio
n
17
< 2012 2012
Taking Off the Gloves with Reference Counting Immix
OOPSLA’13
18
Why So Slow?
10% 10%
-3%
Tim
e v
Pro
du
ctio
n
19
Total Mutator
GC
Looking a Little Deeper…
10% 9%
32%
7% 4%
28%
-2% -3% -2% -3% -3%
1%
Mu
tato
r v
Pro
du
ctio
n
RC MS SS Immix 20
Time Instructions Retired
L1 D Cache Misses
Looking a Little Deeper…
10% 9%
32%
7% 4%
28%
-2% -3% -2% -3% -3%
1%
Mu
tato
r v
Pro
du
ctio
n
RC MS SS Immix 21
Time Instructions Retired
L1 D Cache Misses
Free List
Bump Pointer
Goal & Challenge
• Goal – Object-local collection – Excellent mutator locality – Copying to eliminate fragmentation
• Immix provides opportunistic copying ✔ Same mutator locality as contiguous allocator
• However, RC is inherently local – References to an object generally unknown – but copying must redirect all references
22
RC Immix
✔ Combines RC and Immix ✔ Line/block reclamation ✔ Line live object count with object reference count
✔ Exploit Immix’s opportunistic copy ✔ Observe new objects can be copied by first GC ✔ Observe old objects can be copied by backup GC
23
Total time
3% faster then Gen Immix, +6% worst case, -21% best case 24
-30%
-20%
-10%
0%
10%
20%
30%
40%
com
pre
ss
jess
db
java
c
mtr
t
jack
avr
ora
blo
at
ecl
ipse
fop
hsq
ldb
jyth
on
luin
de
x
luse
arc
hfix
pm
d
sun
flow
xala
n
pjb
b2
00
5
ge
om
ea
n
fast
er
← T
ime
→
sl
ow
er
RC RC Immix
Summary • RC Immix – Object-local collection – Excellent mutator locality – Copying with RC
• Great performance – Outperforms fastest production
• Transforms RC
Tota
l Tim
e v
Pro
du
ctio
n 10%
RC 2013
RC Immix
-3%
25
Questions?