Writing a (Ruby) compiler in Ruby bottom up

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

Re-opening class definitions

The main reason for handling re-opening of class-definitions at this stage, is for a simple reason:

We're going to attack eigenclasses soon, and being able to handle class re-opening makes that a lot simpler in addition to being needed in and of itself too.

Part of the reason is that I made the choice way back in part 19 to use vtables as the basic method for dispatch, and while vtables make for fast method calls, there's extra book-keeping required if we allow re-defining a method, which Ruby of course does.

But how does that affect eigenclasses?

A quick refresher on vtables, and how this fits with Ruby

As you may or may not remember, this involves an array in the class object with a pointer to each method. This is what an instance, and its class (which is itself an instance of Class) looks like today:

(the left column represents the "slot" in the object structure. With our current 32-bit backend, this is a multiple of 4 bytes offset from the start of the structure)

This is the standard approach for virtual method dispatch in languages like C++. There it is easier than in Ruby for a few reasons:

The class hierarchies are rarely as deep, since they're not all rooted in a single class (Object previously, these days BasicObject). Furthermore, not all methods are virtual, and member functions that are not virtual do not typically get vtable slots. This substantially reduces the likely maximum size of the vtables.
You're not expected to be able to safely call any method on any object. If you do stupid stuff like statically casting objects to incompatible types, your application will crash when you call the wrong methods on it. But in Ruby, that's expected to be "safe" (as in, the worst case is a method_missing. This means a vtable in C++ only needs to contain space for virtual member functions up to and including the current class. New ones added in sub-classes does not impact the size.
You're not expected to be able to re-open classes and make further definitions, and certainly not allowed to define the same method multiple times for the same class (with the result of overwriting past versions)

It's this last one that makes re-opening classes and Eigenclasses tangled up for us. Consider these two variants:


    class Foo
      # Here self == Foo
      class << self
         # Here self == Foo's eigenclass
         def foo
         end
      end
      # Here self == Foo again
    end
    
    class Foo
       
       def self.foo
       end
       
    end

While in the latter case it's not as explicit, in both cases we're handling first class Foo, then opening Foo's eigenclass, then handling class Foo again. While we can optimize away some cruft here later, to merge processing of adjacent definitions (so that e.g. defining self.foo and self.bar right after each other maintains the same scope), that's extra complexity that does not solve the wider issue.

Re-opening classes with vtables

There is one obvious issue with trying to re-open a class with vtables: What about sub-classes?

Consider our earlier example fleshed out a bit more:

What do we now have to do if we want to override, say, vtable slot 2 in Foo? Well, that depends on Bar:

If Bar has overridden vtable slot 2, then we just update Foo
If Bar has not overridden vtable slot 2, then we need to update Foo and Bar.

So we need to expand the Class instance variables to let us find the sub-classes of any given class. As an example, I've added in a Baz class which is a second subclass of Foo. So this:


    
        class Foo
        end
        
        class Bar < Foo
        end
        
        class Baz < Foo
        end
        
        ob = Bar.new

turns into this:

This leaves us with these steps to set a vtable slot:

Does this class have sub-classes?
If yes
- Find the current vtable entry for the current class, lets call it "cur", and the new entry "new", and the vtable slot entry "slot"
- For each subclass "c":
  - if c[slot] == cur (the slot has not been overridden)
    - Recurse and check all subclasses of "c".
    - Set c[slot] = new
  - if c[slot] != cur
    - the slot has been overridden, so we don't do anything else for this class
- Override the current slot for this class

(Instead of using a pointer to the first subclass, and a chained linked list like this, we could use a hash table or array, and we might very well do that at some point, but not now).

Test case

I've added a test case in 02021ad which simply exercises the basic scenarios by creating a class and two sub-classes - one for which a method is overridden - and then re-opens the base class and overrides one of the original methods.

Globals

One of the first things to overhaul is the method I introduced way back at the start of this series to keep track of functions we want to output.

While Ruby does not have functions, we turn every method we find statically into something that looks much more like a function, and name it in a reasonably human-friendly way in part to make it easier to debug the assembler output.

You can check out most of this (all of the handful of lines) yourself, apart from the new globals.rb which contains the meat in this method (in b88a31b):


      # Returns the actual name
      def set(fname, function)
        suffix = ""
        i = 0
        while @global_functions[fname + suffix]
          i += 1
          suffix = "__#{i}"
        end
        fname = fname + suffix
    
        # add the method to the global list of functions defined so far
        # with its "munged" name.
        @global_functions[fname] = function
        fname
      end

All this does is append __ to a desired global function name and return the (possibly munged) name.

The reason we need to do this is that we now of course can have multiple Foo#bar methods at various stages of the execution.

New class object vs. re-opening

I also snuck in another minor change in the preceding commit (b88a31b)

To allow re-opening classes, we need to not blindly create new class objects as we did before (though we're still assuming the class and its name is/was possible to determine statically. This is the change, which basically wraps an if check around the old version:


    -    compile_exp(scope, [:assign, name.to_sym, [:sexp,[:call, :__new_class_object, [cscope.klass_size,superclass,ssize]]]])
    +    compile_eval_arg(scope, [:if,
    +                             [:sexp,[:eq, name, 0]],
    +                             # then
    +                             [:assign, name.to_sym,
    +                              [:sexp,[:call, :__new_class_object, [cscope.klass_size,superclass,ssize]]]
    +                             ]
    +                            ])

Updating the class objects / vtable

Next up we assign two extra instance variable slots in the class objects:


      # slot 4 is reserved for subclasses
      # slot 5 is reserved for next_sibling
      CLASS_IVAR_NUM = 6

You'll find the remaining bits in 4733be0, and we'll look at lib/core/class.rb in more detail.

Firstly, this part of __new_class_object is new:


    # Sub-classes
      (assign (index ob 4) 0) 
      (if (eq superclass 0)
         (assign (index ob 5) 0)
         (do
            # Link in as subclass:
            (assign (index ob 5) (index superclass 4))
            (assign (index superclass 4) ob)
            )
    )

Basically it simply links this class into the list of sub-classes for it's superclass, by prepending it to the head.

The more interesting bit is __set_vtable. This was previously the over-simplistic:


    %s(defun __set_vtable (vtable off ptr)
      (assign (index vtable off) ptr)
      )

Now it looks like this:


    %s(defun __set_vtable (vtable off ptr)
       (let (p)
        (assign p (index vtable 4))
        (while (sexp p)
           (do
              (if (eq (index p off) (index vtable off)) (__set_vtable p off ptr))
              (assign p (index p 5))
           )
        )
      (assign (index vtable off) ptr)
    ))

This boils down to this "pseudo-Ruby" which follows the algorithm sketched out earlier:


       p = vtable.subclasses
       while p
          if p[off] == vtable[off]; __set_vtable(p,off,ptr); end
          p = p.next_sibling
       end
       vtable[off] = ptr

And that's it!

Next up: Eigenclasses.

Writing a (Ruby) compiler in Ruby bottom up - step 42 2014-12-07