knowing your garbage collector / python madrid
DESCRIPTION
Talk about garbage collection in CPython and PyPyTRANSCRIPT
Knowing your garbage collector
Francisco Fernandez Castano
Rushmore.fm
[email protected] @fcofdezc
October 21, 2014
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 1 / 37
Overview
1 IntroductionMotivationConcepts
2 AlgorithmsCPython RCPyPy
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 2 / 37
Motivation
Managing memory manually is hard.
Who owns the memory?
Should I free these resources?
What happens with double frees?
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 3 / 37
Dangling pointers
int *func(void)
{
int num = 1234;
/* ... */
return #
}
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 4 / 37
Ownership
int *func(void)
{
int *num = malloc (10 * sizeof(int ));;
/* ... */
return num;
}
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 5 / 37
John Maccarthy
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 6 / 37
Basic concepts
Heap
A data structure in which objects may be allocated or deallocated in anyorder.
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 7 / 37
Basic concepts
Heap
A data structure in which objects may be allocated or deallocated in anyorder.
Mutator
The part of a running program which executes application code.
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 8 / 37
Basic concepts
Heap
A data structure in which objects may be allocated or deallocated in anyorder.
Mutator
The part of a running program which executes application code.
Collector
The part of a running program responsible of garbage collection.
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 9 / 37
Garbage collection
Definition
Garbage collection is automatic memory management. While themutator runs , it routinely allocates memory from the heap. If morememory than available is needed, the collector reclaims unused memoryand returns it to the heap.
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 10 / 37
CPython GC
CPython implementation has garbage collection.
CPython GC algorithm is Reference counting with cycle detector
It also has a generational GC.
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 11 / 37
Young objects
[elem * 2 for elem in elements]
balance = (a / b / c) * 4
’asdadsasd -xxx’.replace(’x’, ’y’). replace(’a’, ’b’)
foo.bar()
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 12 / 37
PyObject
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 13 / 37
PyTypeObject
typedef struct _typeobject {
PyObject_VAR_HEAD
const char *tp_name;
Py_ssize_t tp_basicsize , tp_itemsize;
destructor tp_dealloc;
printfunc tp_print;
getattrfunc tp_getattr;
setattrfunc tp_setattr;
void *tp_reserved;
.
.
} PyTypeObject;
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 14 / 37
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 15 / 37
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 16 / 37
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 17 / 37
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 18 / 37
Reference Counting Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 19 / 37
Cycles
l = []
l.append(l)
del l
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 20 / 37
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 21 / 37
Cycles
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 22 / 37
PyObject
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 23 / 37
PyTypeObject
typedef struct _typeobject {
PyObject_VAR_HEAD
const char *tp_name;
Py_ssize_t tp_basicsize , tp_itemsize;
destructor tp_dealloc;
printfunc tp_print;
getattrfunc tp_getattr;
setattrfunc tp_setattr;
void *tp_reserved;
.
.
} PyTypeObject;
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 24 / 37
PyGC Head
typedef union _gc_head {
struct {
union _gc_head *gc_next;
union _gc_head *gc_prev;
Py_ssize_t gc_refs;
} gc;
double dummy; /* force worst -case alignment */
} PyGC_Head;
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 25 / 37
CPython Memory Allocator
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 26 / 37
CPython Memory Allocator
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 27 / 37
Demo
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 28 / 37
Reference counting
Pros: Is incremental, as it works, it frees memory.
Cons: Detecting Cycles could be hard.
Cons: Size overhead on objects.
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 29 / 37
PyPy
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 30 / 37
Mark and Sweep Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 31 / 37
Mark and Sweep Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 32 / 37
Mark and Sweep Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 33 / 37
Mark and Sweep Algorithm
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 34 / 37
Mark and sweep
Pros: Can collect cycles.
Cons: Basic implementation stops the world
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 35 / 37
Questions?
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 36 / 37
The End
Francisco Fernandez Castano (@fcofdezc) Python GC October 21, 2014 37 / 37