As part of my compiler project, one of my imminent decisions is what object model to use, and sine I like Ruby it seemed a good time to go through Ruby and look at the guts of the Ruby object model. If you've dabbled in meta-programming etc. for Ruby this post probably doesn't contain much new stuff for you. If you're a beginner you may want to look at a tutorial instead.
If you're somewhere in between, hopefully there may be some insights here and there - especially if you're interested specifically in how things work "under the hood" rather than just what is visible to Ruby.
The information in this post is based largely on the Ruby 1.8.x interpreter (you'll see it referred to as MRI as well - Matz Ruby interpreter - to distinguish it from other Ruby implementations), and we'll look at code fragments, as well as a diagram or two to illustrate.
The Ruby object model can make you go insane. _why has a great article that illuminates the particular hairy beast that is meta classes
Conceptually, Ruby objects consists of the following:
MRI "cheats" in that objects of the built-in classes Class, Float, Bignum,Struct, String, Array, Regexp, Hash and File are not real Ruby objects: They don't have the hash table of instance variables. Instead they have a fixed structure. You can still set instance variables for these objects, but behind the scenes MRI 1.8.x at least will then create an instance variable hash table and store that hash table in a hash mapping from the object. This saves a little bit of space if (as is usually the case) most objects of these built in classes don't have instance variables besides the "hardcoded" ones - for small objects like Float in particular this can be a significant saving.
(The following is updated per comment from Rick DeNatale below:)
For some other classes, there's not even a pointer. MRI takes advantage of the fact that there are certain values the object pointer won't take. When MRI determines the class of an "object" it follows the following procedure:
The diagram below shows the actual inheritance of Object, Module and Class in MRI as visible from C. If this doesn't make sense to you, read the section on #send below, but particularly keep in mind that:
So, for example, when sending a message to the object Class, you step out, to the meta class "Class" (in blue), and look for it there, if you don't find it, you follow the solid arrows until you do. This in turn will get you: the Module meta class, the Object meta class, Class, Module, Object, Kernel (since classes in Ruby are open, you might also find someone has reopened the class and caused additional modules etc. to get inserted into the inheritance chain).
Because Ruby's #class method skips metaclasses, you will get "Class" returned if you do Class.class - it's following the same chain as above, but skipping the blue boxes (and any included Module's)
To explore the class/metaclass inheritance in Ruby, you can use this minimal extension to see what values MRI actually operates on:
#include "ruby.h"
VALUE real_super(VALUE self) {
return RCLASS(self)->super;
}
VALUE real_klass(VALUE self) {
return RBASIC(self)->klass;
}
void Init_inheritance() {
rb_define_method(rb_cClass,"real_super",real_super,0);
rb_define_method(rb_cClass,"real_klass",real_klass,0);
}
Put the above in "inheritance.c" and create extconf.rb:
require 'mkmf'
extension_name = 'inheritance'
dir_config(extension_name)
create_makefile(extension_name)
To use it do "ruby extconf.rb ; make", and then:
p Object.real_klass
p Object.real_super
Conceptually a Class object consist of:
MRI defines the following flags for an object:
"Singleton" is set on objects of type Class that are meta-classes.
Mark and Finalize are used by the garbage collector. "Taint" is used for the "taint mode" - depending on the security level, some objects may be marked as "tainted" and thus unsafe, and certain operations may be disallowed.
"Exivar" is used for objects of the "fake" builtin classes to indicate that this instance has an external instance variable hash table that must be looked up elsewhere.
"Freeze" prevents an object from being modified.
Of these, only Singleton, Taint and Freeze represent behavior that are "observable" from inside Ruby, as opposed to when delving under the hood of MRI.
This makes MRI method calls extremely expensive. The method cache alleviates some of it, but the remaining work still has a huge cost compared to C++ for example, where the worst case runtime overhead is a virtual method in a multiple-inherited class, which consists of an indirect pointer lookup and a call to a "thunk" function that fixes up the object pointer (a couple of instructions at most) so the method can access the right data, and jumps to the real method, and the typical cost is an indirect pointer lookup and a call instruction. Some of this overhead is avoidable in a more streamlined implementation, but quite a bit of it is extremely hard or impossible to get rid of within the required semantics of Ruby.
#!/bin/env ruby
require 'benchmark'
class Foo
def test1
end
def method_missing sym
if sym == :test2
end
end
end
Foo.class_eval do
define_method :test3 do
end
end
n = 1000000
f = Foo.new
Benchmark.bmbm(20) do |x|
x.report("test1") { n.times { f.test1 } }
x.report("test2 (method_missing)") { n.times { f.test2 } }
x.report("test3 (define method)") { n.times { f.test3 } }
end
And here are the results from one of my machines:
$ ruby send_vs_method_missing.rb
Rehearsal ----------------------------------------------------------
test1 1.390000 0.390000 1.780000 ( 1.788041)
test2 (method_missing) 1.660000 0.460000 2.120000 ( 2.626747)
test3 (define method) 1.880000 0.450000 2.330000 ( 2.326830)
------------------------------------------------- total: 6.230000sec
user system total real
test1 1.320000 0.450000 1.770000 ( 1.777432)
test2 (method_missing) 1.660000 0.480000 2.140000 ( 2.124609)
test3 (define method) 1.870000 0.450000 2.320000 ( 2.326343)