wtf is a method cache?
DESCRIPTION
Talk about the method caching patches I wrote that led to jamesgolick ruby. Given at RuPy 2013 in Budapest.TRANSCRIPT
BitLove
What the fuck is a method cache?
James Golick
Thursday, 17 July, 14
James Golickwriting: http://jamesgolick.com
code: https://github.com/jamesgolickshit talk: https://twitter.com/jamesgolick
podcast: http://realtalk.io
Thursday, 17 July, 14
@jamesgolick
Thursday, 17 July, 14
BitLove
DISCLAIMERC O M P U T O L O G Y A H E A D
Thursday, 17 July, 14
Thursday, 17 July, 14
Big “O” Notation
Thursday, 17 July, 14
stuff.each do |thing| # ...end
Variable Time
Thursday, 17 July, 14
a = stuff.popa = !!astuff.unshift a
Constant Time
Thursday, 17 July, 14
BitLove
What the fuckis a method cache?
Thursday, 17 July, 14
BitLove
1. Background
Thursday, 17 July, 14
struct RClass { struct RClass super; struct st_table m_tbl;};
Thursday, 17 July, 14
Method Resolution
Thursday, 17 July, 14
class A def a puts 'Hi!' endend
class B < A; endclass C < B; endclass D < C; endclass E < D; endclass F < E; end
F.new.a
Thursday, 17 July, 14
rb_method_entry_tvm_resolve_method(struct RClass klass, symbol_t method_name){ rb_method_entry_t ent = st_lookup(klass.method_tbl, method_name);
if (ent) { return ent; } else { if (klass->super) { return vm_resolve_method(klass.super, method_name); } else { return NULL; } }}
Thursday, 17 July, 14
Module Inclusion
Thursday, 17 July, 14
module A def a "Hello, World!" endend
module B; include A; endmodule C; include B; end
class D include Cend
D.new.a
Thursday, 17 July, 14
module A ICLASS ICLASS ICLASS
module B ICLASS ICLASS
module C ICLASS
class D
Thursday, 17 July, 14
irb> ActiveRecord::Base.included_modules.length=> 71
Thursday, 17 July, 14
Summary
• Methods are stored in a hashtable on the class where they’re defined.
• Method resolution is a variable time algorithm whose complexity depends on the depth of your class hierarchy.
• Module inclusion substantially increases the depth of your class hierarchy, especially if those modules themselves include modules.
• Method resolution is expensive.
Thursday, 17 July, 14
BitLove
What the fuck is a method cache?
Thursday, 17 July, 14
BitLove
2. Method Cachingin the pre-
jamesgolick era
Thursday, 17 July, 14
Instruction Caches
Thursday, 17 July, 14
static uint global_vm_state = 0;
Thursday, 17 July, 14
struct inline_cache { struct RClass klass; uint vm_state; rb_method_entry_t me;}
Thursday, 17 July, 14
rb_method_entry_tvm_search_method(struct RClass klass, rb_symbol_t method_name, struct inline_cache ic){ rb_method_entry_t me; if (is_valid_cache_entry(ic, cache)) { me = ic.me; } else { me = vm_resolve_method(klass, method_name); ic.me = me; ic.vm_state = GET_VM_STATE(); ic.klass = klass; } return me;}
Thursday, 17 July, 14
intis_valid_cache_entry(struct inline_cache ent, struct RClass klass){ return ent.klass == klass && ent.vm_state = GET_VM_STATE();}
Thursday, 17 July, 14
Global Method Cache
Thursday, 17 July, 14
instruction cache
instruction cache
instruction cache
global cache
method resolution
Thursday, 17 July, 14
instruction cache
instruction cache
instruction cache
global cache
method resolution
Thursday, 17 July, 14
struct method_cache_entry { struct RClass klass; uint vm_state; rb_method_entry_t me;}
Thursday, 17 July, 14
#define METHOD_CACHE_SIZE 2048
static struct rb_method_cache_entry method_cache[METHOD_CACHE_SIZE];
Thursday, 17 July, 14
rb_method_entry_t *vm_resolve_method(struct RClass *klass, symbol_t method_name){ struct method_cache_entry ent; rb_method_entry_t *me; ent = method_cache[method_name % METHOD_CACHE_SIZE]; if (is_valid_cache_entry(ent, klass)) { me = cache_entry.me; } else { me = vm_resolve_method_without_cache(klass, method_name); cache_entry.me = me; cache_entry.vm_state = GET_VM_STATE(); cache_entry.klass = klass; } return me;}
Thursday, 17 July, 14
intis_valid_cache_entry(struct method_cahe_entry ent, struct RClass klass){ return ent.klass == klass && ent.vm_state = GET_VM_STATE();}
Thursday, 17 July, 14
Cache Invalidation
Thursday, 17 July, 14
static uint64_t global_vm_state = 0;
#define INC_VM_STATE global_vm_state++
voidrb_define_method(struct RClass *klass, symbol_t name, rb_method_entry_t *me){ // ... INC_VM_STATE(); // ...}
Thursday, 17 July, 14
Defining methods.
Aliasing methods.
Removing methods.
Setting or removing constants.
Defining a class.
Defining a module.
Including a module.
things that bust the cache
Thursday, 17 July, 14
Extending a module.
Using a refinement. (Ruby 2.0)
Garbage collecting a class.
Garbage collecting a module.
Changing the visibility of a constant.
Marshal loading an extended constant.
Autoload.
Built-in non-blocking IO methods.
things that bust the cache
Thursday, 17 July, 14
OpenStruct instantiation.
things that bust the cache
Thursday, 17 July, 14
Summary
• Method resolutions are cached in two places.
• Instruction caches are structs attached to the send instruction.
• The global method cache is a hash table fixed at 2048 entries with no collision semantics and a random eviction policy.
• Method cache entries are valid if their `vm_state` property is the same as the current value of the `global_vm_state` counter.
Thursday, 17 July, 14
Summary
• Method cache invalidation is always global, and happens frequently in most ruby code.
• Method cache invalidation is constant time.
Thursday, 17 July, 14
Numbers
Thursday, 17 July, 14
BitLove
3. jamesgolick Method Caching
Thursday, 17 July, 14
struct RClass { struct RClass super; struct st_table m_tbl; struct st_table mc_tbl; uint64_t seq; subclass_list_entry_t subclasses;};
Thursday, 17 July, 14
static uint64_t rb_vm_sequence = 0;
#define NEXT_SEQ() ++rb_vm_sequence
Thursday, 17 July, 14
struct RClassclass_alloc(...){ struct RClass klass; // ... klass.seq = NEXT_SEQ(); // ... return klass;}
Thursday, 17 July, 14
struct inline_cache { uint64_t seq; rb_method_entry_t me;}
Thursday, 17 July, 14
rb_method_entry_t *vm_search_method(struct RClass klass, rb_symbol_t method_name, struct inline_cache ic){ rb_method_entry_t me; if (ic.seq == klass.seq) { me = ic.me; } else { me = vm_resolve_method(klass, method_name); ic.me = me; ic.seq = klass.seq; } return me;}
Thursday, 17 July, 14
struct method_cache_entry { uint64_t seq; rb_method_entry_t me;}
rb_method_entry_t *vm_resolve_method(struct RClass klass, symbol_t method_name){ struct method_cache_entry ent; rb_method_entry_t me; ent = vm_get_method_cache_entry(klass, method_name); if (ent.seq == seq) { me = cache_entry.me; } else { me = vm_resolve_method_without_cache(klass, method_name); cache_entry.me = me; cache_entry.seq = klass.seq; } return me;}
Thursday, 17 July, 14
voidrb_clear_cache_by_class(struct RClass klass){ subclass_list_entry_t ent; klass.seq = NEXT_SEQ(); ent = klass.subclasses; while(ent != NULL) { rb_clear_cache_by_class(ent.klass); ent = ent.next; }}
Thursday, 17 July, 14
Object
User
ActionController::Base
ActiveRecord::Base
UsersController
SessionsController Group
Thursday, 17 July, 14
Object
User
ActionController::Base
ActiveRecord::Base
UsersController
SessionsController Group
Thursday, 17 July, 14
Summary
• Both types of method cache entries now only need to store a seq and method entry.
• Method caches are now stored with the RClass structs and are !effectively" unbounded in size.
• Each RClass has a globally unique 64bit identifier.
• Method cache entries are tagged with the sequence of their target klass at the time the cache entry was filled.
Thursday, 17 July, 14
Summary
• Entries are valid if their filled entry sequence is the same as the current sequence identifier of the klass that is the target of the invocation.
• Method caches are invalidated by assigning a new sequence value to a klass.
• When changes are made to a klass, we traverse all of its descendents and assign them new sequence values.
• This traversal is unfortunately a variable time algorithm, and can be quite expensive.
Thursday, 17 July, 14
BitLove
4. rvm install jamesgolick
Thursday, 17 July, 14
Dat Patch
• Top-down class hierarchy tracking.
• Class#subclasses
• Module#included_in
• Possible future bug fixes.
• Hierarchical method cache invalidation.
• Method cache instrumentation.
Thursday, 17 July, 14
Instrumentation
• RubyVM::MethodCache.hits
• RubyVM::MethodCache.misses
• RubyVM::MethodCache.miss_time
• RubyVM::MethodCache.invalidation_time
Thursday, 17 July, 14
Get The Code
• rvm install jamesgolick
• git clone git://github.com/jamesgolick/ruby.git
• https://github.com/jamesgolick/ruby
Thursday, 17 July, 14
Questions?
Thursday, 17 July, 14