python and ruby vms

21
Python and Ruby VMs CPython and Matz's Ruby Implementation details

Upload: dmitri-babaev

Post on 06-May-2015

1.688 views

Category:

Technology


5 download

DESCRIPTION

Moscow Big Systems/Big Data, April 2013 meetup presentation slides

TRANSCRIPT

Python and Ruby VMsCPython and Matz's Ruby Implementation

details

Why should you care about Ruby

● Opscode Chef● Puppet● VMware Cloud Foundry● Red Hat OpenShift● Redmine

Why should you care about Python

● OpenStack● Mercurial● Bazaar

Matz's Ruby Implementation (MRI) / Yet another Ruby VM (YARV)

ruby-lang.org

Matz's Ruby Implementation (MRI) / Yet another Ruby VM (YARV) outline● Memory management

○ Automatic, full heap mark-sweep GC● Execution model

○ Bytecode interpretation (stack machine) from 1.9 (YARV)

○ Direct AST interpretation before 1.9 (MRI)● Concurrency

○ Multi-threaded, one active interpreter thread at time○ Green threads before 1.9 (MRI), OS level threads in

1.9 (YARV)● Method calls

○ Late binding, search for method in class dict by name

Typical interpreter execution model

Script.........

If

a=1 a=2

ParsingBytecode generation

Interpreter thread stacks

Heap

...Instruction aInstruction bInstruction c

...

Currently executed instruction

AST

GIL ownership diagram

Thread 1

Thread 2

GIL state

Interpreting IO Waiting

Owned by Thread 1

IO Interpreting

Owned by Thread 2Free

InterpretingIO

Owned by Thread 1

IO Waiting

MRI memory allocation diagram

Object pool 1

Object pool 2

Heap

RArray data RString data

Free list 1 Free list 2

MRI memory allocation

● Any ruby object is allocated on heap (even local variables)

● SLAB like allocation for Ruby objects○ C union is used, hence all objects are of the same

size (40 bytes)○ unlike typical SLAB allocator there is only one size of

objects to store● RString, RArray, RHash, etc. have a pointer

on external memory block containing the actual contents

MRI memory allocation (continue)

● External memory block for string or array is allocated using plain malloc

● String content can be shared between several objects (copy on write)

● 1.9 changes: small strings (23 bytes or less) are embedded into RString structure rather than allocated externally

MRI GC

● If there is no free slot for an object GC is run○ If there is still no free slot new slab (pool) is allocated

■ Unlike Java GC is not triggered only when all heap is utilized

● Stop the world mark-sweep GC○ Unlike Java or .NET there is no generations

MRI GC (continue)

● 1.9.3 changes: lazy sweep GC○ "In Lazy sweeping, each invocation of the object

allocation sweeps the heap until it finds an appropriate free object"■ i. e. just search for object marked as dead

instead of building free lists● 2.0 changes

○ Instead of marking live objects with FL_MARK flag external bitmap is created■ This allows to avoid excessive copies of memory

regions in forked processes

MRI Links

● Threads in Ruby discussion: http://stackoverflow.com/questions/56087/does-ruby-have-real-multithreading

● MRI GC slides: http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/

CPythonpython.org

CPyton VM outline

● Memory management○ Automatic, reference counting

● Execution model○ Bytecode interpretation (stack machine)○ Maps, lists, tuples are created and managed by

bytecode instructions● Concurrency

○ Multi-threaded, one active interpreter thread at time● Method calls

○ Late binding, search for method in class dict by name

Python GC

● CPython uses reference counting to track object visibility○ Python uses global interpreter lock in order to avoid

synchronization on each reference operation● Cyclic references

○ Example: l = []; l.append(l); del l○ Cyclic references are only possible for "container"

objects● The GC for cyclic references has been

included since version 2.2 and is enabled by default

Search for cyclic references in CPython (generations)

● The GC classifies objects into three generations depending on how many collection sweeps they have survived○ New objects are placed in the youngest generation

(generation 0)○ If an object survives a collection it is moved into the

next older generation○ Since generation 2 is the oldest generation, objects

in that generation remain there after a collection

Search for cyclic references in CPython (activation)

● When the number of allocations minus the number of deallocations exceeds first threshold (gc.get_threshold), collection starts○ Initially only generation 0 is examined○ If generation 0 has been examined more than

second threshold times since generation 1 has been examined, then generation 1 is examined as well

○ Third threshold controls the number of collections of generation 1 before collecting generation 2

Objects with __del__ method in reference cycle

● Which __del__ method for two objects in cycle should be called first?○ After calling the first finalizer the object cannot be

freed as the second finalizer still may access it● Cycles that are referenced from objects with

finalizers are added to a global list of uncollectable garbage (gc.garbage)○ The program can access the global list and free

cycles in a way that makes sense for application

CPython links

● Python GC description: http://arctrix.com/nas/python/gc/

● GC module documentation: http://docs.python.org/2/library/gc.html

● Python method call description: http://css.dzone.com/articles/python-internals-how-callables-0