garbage collection and the ruby heap
Post on 10-Apr-2015
34.762 Views
Preview:
TRANSCRIPT
Gargbage Collection and the Ruby Heap
Joe Damato@joedamato
timetobleed.comice799 on github/irc
Tuesday, June 8, 2010
Ruby developers know...
Tuesday, June 8, 2010
Rubyis
fatboyke (flickr)Tuesday, June 8, 2010
Ruby loves eating RAM
37prime (flickr)Tuesday, June 8, 2010
this talk is about what ruby does with your
RAM
let’s take a look inside the VM
Tuesday, June 8, 2010
ruby allocates memory from the OS
memory is broken up into slots
each slot holds one ruby object
Tuesday, June 8, 2010
when you need an object, it’s pulled off the freelist
a linked list called the ‘freelist’ points to all the
empy slots on the ruby heap
Tuesday, June 8, 2010
when you need an object, it’s pulled off the freelist
a linked list called the ‘freelist’ points to all the
empy slots on the ruby heap
Tuesday, June 8, 2010
when you need an object, it’s pulled off the freelist
a linked list called the ‘freelist’ points to all the
empy slots on the ruby heap
Tuesday, June 8, 2010
if the freelist is empty, GC is run
GC finds non-reachable objects and adds them to the freelist
when you need an object, it’s pulled off the freelist
a linked list called the ‘freelist’ points to all the
empy slots on the ruby heap
Tuesday, June 8, 2010
if the freelist is empty, GC is run
GC finds non-reachable objects and adds them to the freelist
when you need an object, it’s pulled off the freelist
a linked list called the ‘freelist’ points to all the
empy slots on the ruby heap
Tuesday, June 8, 2010
if the freelist is empty, GC is run
GC finds non-reachable objects and adds them to the freelist
when you need an object, it’s pulled off the freelist
a linked list called the ‘freelist’ points to all the
empy slots on the ruby heap
if the freelist is still empty (all slots were in use)
Tuesday, June 8, 2010
if the freelist is empty, GC is run
GC finds non-reachable objects and adds them to the freelist
when you need an object, it’s pulled off the freelist
a linked list called the ‘freelist’ points to all the
empy slots on the ruby heap
if the freelist is still empty (all slots were in use)
another heap is allocated
all the slots on the new heap are added to the freelist
Tuesday, June 8, 2010
turns out,
Ruby’s GC is
also one of the
reasons it can be so
slowantphotos (flickr)Tuesday, June 8, 2010
Matz’ Ruby Interpreter (MRI 1.8)has a...
john_lam (flickr)Tuesday, June 8, 2010
Conservativelifeisaprayer (flickr)
Tuesday, June 8, 2010
Stopthe
Worldbenimoto (flickr)
Tuesday, June 8, 2010
Markand
Sweepmichaelgoodin (flickr)
Tuesday, June 8, 2010
Garbage Collector
kiksbalayon (flickr)
Tuesday, June 8, 2010
•conservative: the VM hands out raw pointers to ruby objects
•stop the world: no ruby code can execute during GC
•mark and sweep: mark all objects in use, sweep away unmarked objects
Tuesday, June 8, 2010
more objects=
longer GC
Tuesday, June 8, 2010
longer GC=
less time to run your ruby code
Tuesday, June 8, 2010
fewer objects=
better performance
Tuesday, June 8, 2010
improve performance1. remove unnecessary object allocations
object allocations are not free
2. avoid leaked referencesnot really memory ‘leaks’
you’re holding a reference to an object you no longer need. GC sees the reference, so it keeps the object around
Tuesday, June 8, 2010
the GC follows
references recursively, so a reference
to classA will ‘leak’ all these objects
Tuesday, June 8, 2010
useful tools• ltrace
• GC tuning
• ObjectSpace.each_object
• gdb.rb
• bleak_house
• heap dumping patches
• memprof
Tuesday, June 8, 2010
ltrace
• can use system ltrace
• mine is cooler
• http://github.com/ice799/ltrace/tree/libdl
• can trace GC, mysql queries, and more.
• linux only
Tuesday, June 8, 2010
ltraceltrace -F ltrace.conf -ttTg -x garbage_collect ruby gc.rb
15:39:22.637185 garbage_collect() = <void> <0.002420> 15:39:22.650797 garbage_collect() = <void> <0.005480>
15:39:22.677607 garbage_collect() = <void> <0.012134>
15:39:22.729645 garbage_collect() = <void> <0.024849> 15:39:22.828402 garbage_collect() = <void> <0.048067>
15:39:23.007304 garbage_collect() = <void> <0.089344> 15:39:23.339801 garbage_collect() = <void> <0.163595>
15:39:23.929944 garbage_collect() = <void> <0.297686>
Tuesday, June 8, 2010
useful tools• ltrace
• GC tuning
• ObjectSpace.each_object
• gdb.rb
• bleak_house
• heap dumping patches
• memprof
Tuesday, June 8, 2010
GC tuningRuby Enterprise Edition contains a GC tuning patch
We use:
RUBY_GC_MALLOC_LIMIT=60000000
RUBY_HEAP_MIN_SLOTS=500000
RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
RUBY_HEAP_SLOTS_INCREMENT=1
Tuesday, June 8, 2010
malloc_limit = 60MBforce garbage collection after every malloc_limit bytes worth of calls to malloc or realloc
defaults to 8MB
high traffic ruby servers can easily allocate and free more than 8mb in a single request
gc.c’s ruby_xmalloc wrapper used by internal objects such as String, Array and Hash
void *ruby_xmalloc(size) long size;{ void *mem; if (malloced > malloc_limit) garbage_collect();
mem = malloc(size); malloced += size;
return mem;}
Tuesday, June 8, 2010
HEAP_MIN_SLOTS = 500k
defaults to 10k
number of slots in the first slab
a new rails app boots up with almost 500k objects on the heap (mostly code)
(gdb) ruby objects nodes 20996 NODE_CONST 21620 NODE_SCOPE 26329 NODE_LASGN 26747 NODE_STR 33178 NODE_METHOD 40678 NODE_LIT 79046 NODE_LVAR 90646 NODE_NEWLINE 95758 NODE_BLOCK 107357 NODE_CALL 150298 NODE_ARRAY
Tuesday, June 8, 2010
HEAP_SLOTS_GROWTH = 1
defaults to 1.8x
each new slab is almost twice as big as the last
normal growth:
10k
10k + 18k = 28k
10k + 18k + 36k = 64k
tuned growth:
500k
500k + 500k = 1M
Tuesday, June 8, 2010
useful tools• ltrace
• GC tuning
• ObjectSpace.each_object
• gdb.rb
• bleak_house
• heap dumping patches
• memprof
Tuesday, June 8, 2010
types = Hash.new(0)ObjectSpace.each_object do |obj| types[obj.class] += 1endpp types.sort_by{ |klass,num| num }
[ ..., [Module, 18], [Class, 158], [String, 1725]]
* on Ruby 1.9, use ObjectSpace.count_objects
Tuesday, June 8, 2010
• ltrace
• GC tuning
• ObjectSpace.each_object
• gdb.rb
• bleak_house
• heap dumping patches
• memprof
useful tools
Tuesday, June 8, 2010
gdb.rb: gdb hooks for REE
• http://github.com/tmm1/gdb.rb
• attach to a running REE process and inspect the heap
• number of nodes by type• number of objects by class• number of strings by content• number of arrays/hash by size
• uses gdb7 + python scripting
• linux only
(gdb) ruby objects strings 140 u 'lib' 158 u '0' 294 u '\n' 619 u '' 30503 unique strings 3187435 bytes
(gdb) ruby objects HEAPS 8 SLOTS 1686252 LIVE 893327 (52.98%) FREE 792925 (47.02%) scope 1641 (0.18%) regexp 2255 (0.25%) data 3539 (0.40%) class 3680 (0.41%) hash 6196 (0.69%) object 8785 (0.98%) array 13850 (1.55%) string 105350 (11.79%) node 742346 (83.10%)
Tuesday, June 8, 2010
fixing a leak in rails_warden(gdb) ruby objects classes 1197 MIME::Type 2657 NewRelic::MetricSpec 2719 TZInfo::TimezoneTransitionInfo 4124 Warden::Manager 4124 MethodOverrideForAll 4124 AccountMiddleware 4124 Rack::Cookies 4125 ActiveRecord::ConnectionManagement 4125 ActionController::Session::CookieStore 4125 ActionController::Failsafe 4125 ActionController::ParamsParser 4125 Rack::Lock 4125 ActionController::Dispatcher 4125 ActiveRecord::QueryCache
middleware chain leaking per request
Tuesday, June 8, 2010
god memory leaks(gdb) ruby objects arrays elements instances 94310 3 94311 3 94314 2 94316 1 5369 arrays 2863364 elements
arrays with 94k+ elements!
(gdb) ruby objects classes 43 God::Process 43 God::Watch 43 God::Driver 43 God::DriverEventQueue 43 God::Conditions::MemoryUsage 43 God::Conditions::ProcessRunning 43 God::Behaviors::CleanPidFile 45 Process::Status 86 God::Metric327 God::System::SlashProcPoller327 God::System::Process406 God::DriverEvent
Tuesday, June 8, 2010
useful tools• ltrace
• GC tuning
• ObjectSpace.each_object
• gdb.rb
• bleak_house
• heap dumping patches
• memprof
Tuesday, June 8, 2010
bleak_house• http://github.com/fauna/bleak_house
• installs a custom patched version of ruby
• tells you what is leaking (like gdb.rb and ObjectSpace), but also where the leak is happening
191691 total objectsFinal heap size 191691 filled, 220961 freeDisplaying top 20 most common line/class pairs
89513 __null__:__null__:__node__ 41438 __null__:__null__:String 2348 site_ruby/1.8/rubygems/specification.rb:557:Array 1508 gems/specifications/gettext-1.9.gemspec:14:String
Tuesday, June 8, 2010
useful tools• ltrace
• ObjectSpace.each_object
• gdb.rb
• bleak_house
• heap dumping patches
• memprof
Tuesday, June 8, 2010
100000 file.rb:123:String
useful, 100k strings on this line
but..sometimes it’s not enough
what is actually inside these strings?
Tuesday, June 8, 2010
heap dumping patch
• simple patch to ruby VM (300 lines of C)
• http://gist.github.com/73674
• simple text based output format
0x154750 @ -e:1 is OBJECT of type: T0x15476c @ -e:1 is HASH which has data0x154788 @ -e:1 is ARRAY of len: 00x1547dc @ -e:1 is STRING len: 1 and val: T0x154814 @ -e:1 is CLASS named: T inherits from Object0x154a98 @ -e:1 is STRING len: 2 and val: hi0x154b40 @ -e:1 is OBJECT of type: Range
Tuesday, June 8, 2010
$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1
236840 memcached/memcached.rb:316
$ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -2
64952 HASH 123290 STRING
$ wc -l /tmp/ruby.heap
1571529 /tmp/ruby.heap
$ grep "memcached.rb:316" /tmp/ruby.heap | grep STRING | awk '{ print $10 }' | sort | uniq -c | sort -g | tail -2
72095 int(11) 79979 varchar(255)
Tuesday, June 8, 2010
useful tools• ltrace
• ObjectSpace.each_object
• gdb.rb
• bleak_house
• heap dumping patches
• memprof
Tuesday, June 8, 2010
memprof goals
• easy to use: no patching the VM, just require the gem
• detailed: include file/line (bleak_house), object contents (heap dumping patch), but also information about references between objects
• simple analysis: allow processing via various languages and databases using simple JSON data format
Tuesday, June 8, 2010
memprof• http://github.com/ice799/memprof
• gem install memprof
• under active development on github
• works on x86_64 linux and x86_64 osx
• for best results, use an RVM built 1.8.x or REE
• ruby 1.9 support in the works
• 32bit support in the works
Tuesday, June 8, 2010
memprof under the hood• rewrites your Ruby binary in memory
• injects short trampolines for all calls to internal VM functions to do tracking
• uses libdwarf and libelf to access VM internals like the ruby heap slabs
• uses libyajl to dump out ruby objects as json
http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/http://timetobleed.com/memprof-a-ruby-level-memory-profiler/http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/
Tuesday, June 8, 2010
Tuesday, June 8, 2010
• memprof.track
• memprof.dump
• memprof.dump_all
• memprof.com
memprof features
Tuesday, June 8, 2010
Memprof.track{ 100.times{ "abc" } 100.times{ 1.23 + 1 } 100.times{ Module.new }}
100 file.rb:2:String100 file.rb:3:Float100 file.rb:4:Module
• like bleak_house, but for a given block of code
• use Memprof::Middleware in your webapps to run track per request
Tuesday, June 8, 2010
• memprof.track
• memprof.dump
• memprof.dump_all
• memprof.com
memprof features
Tuesday, June 8, 2010
Tuesday, June 8, 2010
{ "_id": "0x19c610",
"file": "file.rb", "line": 2,
"type": "string", "class": "0x1ba7f0", "class_name": "String",
"length": 10, "data": "helloworld"}
memory address of object
file and line where string was created
length and contentsof this string instance
address of the class “String”
stringsMemprof.dump{ "hello" + "world"}
Tuesday, June 8, 2010
floats and strings are separate ruby objects
{ "_id": "0x19c5c0",
"class": "0x1b0d18", "class_name": "Array",
"length": 4, "data": [ 1, ":b",
"0x19c750", "0x19c598" ]}
integers and symbols are stored in the array itself
arraysMemprof.dump{ [ 1, :b, 2.2, "d" ]}
Tuesday, June 8, 2010
hashes{ "_id": "0x19c598",
"type": "hash", "class": "0x1af170", "class_name": "Hash",
"default": null,
"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}
hash entries as key/value pairs
no default proc
Memprof.dump{ { :a => 1, "b" => 2.2 }}
Tuesday, June 8, 2010
classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}
{ "_id": "0x19c408",
"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",
"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}
class variables and constants are stored in the instance variable table
superclass object reference
references to method objects
Tuesday, June 8, 2010
• memprof.track
• memprof.dump
• memprof.dump_all
• memprof.com
memprof features
Tuesday, June 8, 2010
Tuesday, June 8, 2010
Tuesday, June 8, 2010
Tuesday, June 8, 2010
Memprof.dump_all("myapp_heap.json")
• dump out every single live object as json
• one per line to specified file
• analyze via
• jsawk/grep
• mongodb/couchdb
• custom ruby scripts
• libyajl + Boost Graph Library
Tuesday, June 8, 2010
memprof features
• memprof.track
• memprof.dump
• memprof.dump_all
• memprof.com
Tuesday, June 8, 2010
a web-based heap visualizer and leak analyzermemprof.com
Tuesday, June 8, 2010
a web-based heap visualizer and leak analyzermemprof.com
Tuesday, June 8, 2010
memprof.coma web-based heap visualizer and leak analyzer
Tuesday, June 8, 2010
memprof.coma web-based heap visualizer and leak analyzer
Tuesday, June 8, 2010
memprof.coma web-based heap visualizer and leak analyzer
Tuesday, June 8, 2010
memprof.coma web-based heap visualizer and leak analyzer
Tuesday, June 8, 2010
memprof.coma web-based heap visualizer and leak analyzer
Tuesday, June 8, 2010
memprof.coma web-based heap visualizer and leak analyzer
Tuesday, June 8, 2010
plugging a leak in rails3• in dev mode, rails3 is leaking 10mb per request
# in environment.rbrequire `gem which memprof/signal`.strip
let’s use memprof to find it!
Tuesday, June 8, 2010
plugging a leak in rails3
tell memprof to dump out the entire heap to json
$ memprof --pid <pid> --name <dump name> --key <api key>
send the app some requests so it leaks
$ ab -c 1 -n 30 http://localhost:3000/
Tuesday, June 8, 2010
2519 classes
30 copies of TestController
mongo query for all TestController classes
details for one copy of TestController
Tuesday, June 8, 2010
find references to object
holding references to all controllers
“leak” is on line 178
Tuesday, June 8, 2010
• In development mode, Rails reloads all your application code on every request
• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization
• But.. it ends up holding a reference to every single reloaded version of those controllers
Tuesday, June 8, 2010
more* memprof features
• memprof.trace
• memprof::tracer
* currently under development
Tuesday, June 8, 2010
Tuesday, June 8, 2010
config.middleware.use(Memprof::Tracer)
{ "time": 4.3442,
"rails": { "controller": "test", "action": "index" },
"request": { "REQUEST_PATH": "/test,, "REQUEST_METHOD": "GET" },
total time for request
rails controller/action
request env info
Tuesday, June 8, 2010
"mysql": { "queries": 3, "time": 0.00109302 },
"gc": { "calls": 8, "time": 2.04925 },
config.middleware.use(Memprof::Tracer)
8 calls to GC2 secs spent in GC
3 mysql queries
Tuesday, June 8, 2010
"objects": { "created": 3911103, "types": { "none": 1168831, "object": 1127, "float": 627, "string": 1334637, "array": 609313, "hash": 3676, "match": 70211 } }}
config.middleware.use(Memprof::Tracer)
3 million objs created
lots of stringslots of arrays
regexp matches
object instances1 million method calls
Tuesday, June 8, 2010
more objects=
longer GC
Tuesday, June 8, 2010
longer GC=
less time to run your ruby code
Tuesday, June 8, 2010
fewer objects=
better performance
Tuesday, June 8, 2010
Use these tools.Tuesday, June 8, 2010
Questions?Joe Damato@joedamato
timetobleed.comice799 on github/irc
Tuesday, June 8, 2010
top related