topic 17: garbage collection -...
TRANSCRIPT
Topic 17: Garbage Collection
1
Compiler Design
Prof. Hanjun Kim
CoreLab (Compiler Research Lab)
POSTECH
Garbage Collection
• Garbage• A value that will not be used in any subsequent
computation by a program
• Garbage Collection• Operation that makes space belonging to garbage data
available for reuse
• Is GC important?• Many modern programming languages allow
programmers to allocate new storage dynamically• New records, arrays, tuples, objects, closures, etc.
• They need facilities for reclaiming and recycling the storage used by programs
• Who will determine which objects are garbage?
2
A solution
• Explicit Memory Management• User library manages memory; programmer decides
when and where to allocate and deallocate• void* malloc(long n)
• void free(void *addr)
• Library calls OS for more pages when necessary
• Advantage: people are smart
• Disadvantage: people are dumb and they really don’t want to bother with such details if they can avoid it
• Always worrying about dangling pointers, memory leaks: a huge software engineering burden
3
Another Solution
• Automatic Memory Management• How do we decide which objects are garbage?
• Can’t do it exactly
• Therefore, We conservatively approximate
• Normal solution: an object is garbage when it becomes unreachable from the roots
• The roots = registers, stack, global static data
• If there is no path from the roots to an object, it cannot be used later in the computation so we can safely recycle its memory
4
Algorithms
• Reference Counting
• Mark and Sweep
• Copying Collection• Basic
• Cheney’s algorithm
• Generational Algorithm
• Incremental Algorithm• Baker’s algorithm
6
Reference Counting
• Each object has a reference count
• Reference count• Number of references to the object
• Initially, 1
• If reference count becomes 0, the object is garbage
7
Reference Counting
obj = p
• Algorithm• Before the instruction
• Decrease reference count obj
• If count == 0, put obj on free list
• After the instruction• Increase reference count obj
• Changed code• obj.count--;if obj.count == 0, putOnFreeList(obj);
obj = p;
obj.count++;
8
Reference Counting
• Pros• Simple!
• Cons• Very Expensive!
• Manage counts for each assignment
• Cycles of garbage cannot be claimed!• Need to check reachability
12
Mark and Sweep
• Marking• Assume that all objects are unreached
• Mark all the reachable nodes from roots with depth-first search algorithm
• Pseudo codefunction DFS(x)
if x is a pointer into the heap
if x is not marked
mark x
for each field f of x
DFS(x.f)
function marking()
for each root v
DFS(v)
13
Mark and Sweep
• Sweeping• Place all the unreached objects into the freelist
• Pseudo codeFunction sweeping()
p = first address in the heap
while p < last address in the heap
if p is marked
unmark p
else
addToFreelist(p)
p = p + sizeof(p)
• Fragmentation Problem• When a program allocates a record of size n, there are many
free spaces smaller than n, but none of them is larger than n
14
Copying Collection
• Basic Idea: use 2 heaps• One used by program (active heap)
• The other unused until GC time
• GC:• Start at the root sets & traverse the reachable data
• Copy reachable data from the active heap (from-space) to the other heap (to-space)
• Dead objects are left behind in from-space
• Heaps switch roles
16
Cheny’s algorithm
• Copying collection based on breadth-first search
• Pseudo code• Function Cheny()
scan = next = beginning of to-space
for each root r
r = forward(r) // it increases next
while scan < next
for each field f of record scan
scan.f = forward(scan.f)
scan = scan + sizeof(scan)
18
Generational GC
• Observation• If an object has been reachable for a long time, it is likely
to remain so
• Most objects died young
• Conclusion• Do GC for the young objects frequently
• Avoid scanning the old objects
• Generational GC• Divide the heap into partitions P0, P1, …
• Each partition holds older objects than one before it
25
Generational GC
• Create new objects in P0
• When P0 fills,• Garbage collect P0 only
• Move the reachable objects to P1
• When P1 fills• Garbage collect P0 and P1
• Move the reachable objects to P1 and P2 respectively
26
Incremental GC
• Observation• GC sometimes interrupt the program for long periods
• The long response time may cause crucial problems especially for interactive or real-time programs
• Solution • Incremental (Concurrent) GC
• Run GC in parallel with mutation (program execution)
27
Baker’s algorithm
• Based on Cheney’s copying collection
• When GC initiated,• Change the roles of from-space and to-space• Forward all the roots• Resume mutation
• When the mutator allocates memory,• Scan a few pointers• scan advances toward next
• Return memory in the to-space
• When the mutator fetches data from from-space• Forward the pointer to to-space• Extra fetch code = 20% performance penalty• But no long pauses ==> better response time
28