Writing a (Ruby) compiler in Ruby bottom up

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

Putting Self In A Register

Where we left off, we had a simple but tolerable register allocator. But as it happens, for now at least the code will not often get all that huge benefits from it, for the simple reason that a huge amount of a typical Ruby application consists of method calls, and most of these method calls causes us to have to evict most of the registers.

However, there is one obvious candidate for the register treatment that we have alreay touched on:

self.

self is even more ubiquitous in Ruby programs than the source lets on: In addition to every method call, every access to an instance variable is conceptually either <self offset> or <self.some_hidden_hash_table var>. Every class variable requires self.class, and thus self. And so on.

And when calling a method on another object, self just temporarily changes - usually the "new self" will be used a number of times before we return.

So then, this is why I "reserved" a register for caching self - we'll use %esi.

I originally intended to include it in the register allocation part, but when that ballooned to 6k words (including source, to be fair), I decided enough was enough.

In comparison, this part will be pleasantly short and simple.

Changing the calling convention

More than just caching self, we will change the calling convention somewhat:

For now, while we still retain self on the stack (I'm not sure whether I'll keep it that way, but I'd rather avoid messing with it right now), we also put it in %esi before we call the method.

The reasons for doing so are fairly simple:

It means we can assume self == %esi throughout a method, except immediately surrounding a method call on another object.
We have been loading the object value into a register in order to use an indirect load to get self.class anyway, so we might as well use that load to load self straight into %esi and leave it there for the duration.
Since %esi is callee saved, we can trust that it won't suddenly change on us in a method call or function call to external code - when the code returns, %esi should still refer to the object we just called a method on
It makes the code pleasantly simple...

Since %esi is callee saved, it means that we should be able to defer reloading self into it after a call until/unless we know it is needed. This will save some reloads, and may also allow it to be reused on occasion when we call multiple methods on the same object in succession. I'll leave that optimization for later, though.

There are a few parts to this change:

Ensure %esi is reloaded on method exit; we'll do this in #output_functions
Changing #compile_callm to load the "new self" for the called method as needed, and reload self if needed
adding a few helpers to Compiler and Emitter as part of refactoring that code
a minor change to the new register allocator in order to recognize %esi's special status.

We'll go through them "from the top" (of the call flow) anyway. You can find all the code for this part in this single commit: 4b190694bc

output_functions

The only change here is that we want to make sure that %esi is returned to its original state.


              @e.comment("Reloading self if evicted:")
              # Ensure %esi is intact on exit, if needed:
              reload_self(fscope)

Note that this is slightly broken: In the case of defun nodes, it will load the global self, but for these nodes, %esi will actually originally contain the calling self. We'll sort that out later - for now theres a little workaround at the end of #compile_defun. This triggers unnecessary self reloads, but it works:


        @e.evict_regs_for(:self)
        reload_self(scope)

compile_callm

The change to compile_callm is pleasantly small. It only involves the block that's called back from #compile_callm_args to handle the actual call, and should be reasonably self-explanatory (but I'll explain it anyway):


          compile_callm_args(scope, ob, args) do

Here, if we're calling a method on an object named anything other than self, we load the address of the object from the top of the stack, where it has been stuffed as part of preparing the arguments anyway. Do note that this is in many cases going to be wasteful: If the method only has static arguments (or no arguments), we can safely load it into %esi first, and then put it onto the stack from %esi, potentially (unless the object is already in a register) saving a memory load. That's one of many low hanging fruits for optimization later on.


            if ob != :self
              @e.load_indirect(@e.sp, :esi)

If, on the other hand, we called a method on self, we "reload" self if needed.


            else
              @e.comment("Reload self?")
              reload_self(scope)
            end

#reload_self is just a tiny helper that calls #get_arg(scope,:self) for now, and relies on the register allocation in #get_arg to reload %esi with self if it has been evicted.

And here's our cleaned up code to load self.class, and call the right vtable offset.


            load_class(scope) # Load self.class into %eax
            @e.callm(off)

Finally, if the object wasn't self, we ensure self is evicted from %esi so that it will be reloaded later if needed.


            if ob != :self
              @e.comment("Evicting self")
              @e.evict_regs_for(:self)
            end
          end

The only other change in Compiler is to evict self when starting a new scope for classes - see the start of #compile_class

Emitter

For the emitter, I've added more debug output as comments to the code - I'll ignore those changes, you can find them in the commit.

Other than that, I've made the following changes.

First, a new #callm method that moves the offset calculations etc. out of Compiler, where it didn't belong:


      # Call an entry in a vtable, by default held in %eax
      def callm(off, reg=:eax)
        @allocator.evict_caller_saved
        emit(:call, "*#{off*Emitter::PTR_SIZE}(%#{reg.to_s})")
      end

Other than that, I've just added a single line to Emitter#func:


        @allocator.cache_reg!(:self)

This ensures that the compiler treats %esi as already holding self when entering a method.

RegisterAllocator

To the register allocator there are a few more changes.

First I've added the name of the cached variable to the cache. I've also added an instance variable holding the register set aside for self, and started refactoring to handle the caller saved variables separately. See #evict_by_cache and #evict_vars for that change.

The main changes, though, are to give self special treatment as a variable with its own special register that won't otherwise be handed out.

This is mainly handled here in #cache_reg!:


    @@ -174,13 +191,18 @@ class RegisterAllocator
       # available, the emitter will use its scratch register (e.g. %eax)
       # instead
       def cache_reg!(var)
    -    if !@order.member?(var.to_sym)
    +    var = var.to_sym
    +    if var == :self
    +      free = @selfreg
    +    elsif !@order.member?(var)
           return nil
    +    else
    +      free = @free_registers.shift
         end
    -    free = @free_registers.shift
    +

self in action

Let us finally take a look at what some of our now slightly less horrible code looks like.

This is one of our test cases, found in features/inputs/method2.rb:


    class Foo
    
      def initialize
      end
    
      def foo
        puts "foo"
      end
    
      def bar arg
        puts arg
      end
    end
    
    f = Foo.new
    f.foo
    f.bar("bar")

And here is the asm output for Foo#bar. You can see the copious comments that the register allocation emits, but I've added some extra commentary inline anyway:


    __method_Foo_bar:
        .stabn  68,0,12,.LM463 -.LFBB126
    .LM463:
    .LFBB126:

At this point %esi holds self, but as a "backup" [:arg,0] in the compiler, which is mapped at 8(%ebp), also holds %esi. Though in this case, we luckily won't need it, as you can see self remains in %esi throughout:


        pushl   %ebp
        movl    %esp, %ebp
        subl    $20, %esp
        # callm :self.:puts
        subl    $12, %esp
        movl    $3, %ebx
        pushl   %ebx

At this point, the compiler is trying to push self onto the stack, and when it checks the register cache, it tells us self is already in %esi, courtesy of explicitly marking it as cached in Emitter#func. it then proceeds to fill in the arguments:


        # RA: Already cached 'esi' for self
        movl    %esi, 4(%esp)
        movl    $0, 8(%esp)
        # RA: Allocated reg 'ecx' for arg
        # [:arg, 2, :ecx]
        movl    16(%ebp), %ecx
        movl    %ecx, 12(%esp)
        popl    %ebx

Here it is checking if it needs to reload self into the register. If the arguments had contained expressions that included calls to methods of other objects, that might have been necessary. But not this time. So it loads self.class via an indirect load from (%esi) (as a reminder, if esi and eax was C variables, movl (%esi), %eax) is equivalent to eax = esi[0] or eax = *esi). Then it does the method call, via the vtable - the address of #puts have happened to get allocated to slot 36.


        # Reload self?
        # RA: Already cached 'esi' for self
        movl    (%esi), %eax
        call    *36(%eax)
        addl    $12, %esp
        # callm self.puts END
        addl    $20, %esp

Finally, we need to check if there's a chance %esi has been evicted but not reloaded yet, so we can ensure %esi has the right value when we return back up to the calling method. In this case, %esi already holds the right thing:


        # Reloading self if evicted:
        # RA: Already cached 'esi' for self
        leave
        ret

Let's take a look at another example too, namely the start of Object#puts (note that Object is not the right place for #puts - it is there temporarily because we don't support modules yet):


    __method_Object_puts:
        .stabn  68,0,36,.LM347 -.LFBB65
    .LM347:
    .LFBB65:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        subl    $36, %esp
        .stabn  68,0,37,.LM348 -.LFBB65
    .LM348:
        subl    $4, %esp
        movl    $1, %ebx
        # RA: Allocated reg 'edi' for numargs
        # [:lvar, -1, :edi]
        movl    -4(%ebp), %edi
        movl    %edi, (%esp)
        movl    $__get_fixnum, %eax
        call    *%eax
        addl    $4, %esp

Here we see what currently happens when we call a function defined with %s(defun ...). Because I've been too lazy to ensure it doesn't load the "wrong" self before exit, I added the workaround to forcibly reload self after the call, and that is what is happening here. It's wasteful, so I'll address this soon:


        # RA: Evicted self
        # RA: Allocated reg 'esi' for self
        # [:arg, 0, :esi]
        movl    8(%ebp), %esi

Here comes a call to #==, which gets a double whammy, as the other argument is a literal 2, and so triggers a call to __get_fixnum:


        movl    %eax, -20(%ebp)
        .stabn  68,0,39,.LM349 -.LFBB65
    .LM349:
        # callm :na.:==
        subl    $12, %esp
        movl    $3, %ebx
        pushl   %ebx
        # RA: Allocated reg 'ecx' for na
        # [:lvar, 3, :ecx]
        movl    -20(%ebp), %ecx
        movl    %ecx, 4(%esp)
        movl    $0, 8(%esp)
        subl    $4, %esp
        movl    $1, %ebx
        movl    $2, (%esp)
        movl    $__get_fixnum, %eax
        call    *%eax
        addl    $4, %esp
        # RA: Evicted self
        # RA: Allocated reg 'esi' for self
        # [:arg, 0, :esi]
        movl    8(%ebp), %esi
        movl    %eax, 12(%esp)

This code is still far from optimal. Ignoring all the other problems, in terms of the self caching, we evict self on return from a call node just because the current code doesn't maintain the calling convention properly. But after the previous part, I want to keep this short and focused, so lets leave it here for now.

Next time

... we will revisit an old goal: Self-hosting.

First I'll land a number of changes that will make the parser able to parse the entire compiler (whether or not it parses it correctly or not remains to be seen, and is outside the scope).

Secondly, we'll start looking at small parts of the compiler to break into pieces that are:

Amenable to separate testing
Simple enough to try to compile

The goal is to lay the ground work for refactoring what we now have into smaller units that at the same time both allow us to unit test the module and use it as a test case for the compiler at the same time. Each module we can compile will then both give increased confidence in the generated code and bring us one step closer to the state where the compiler can fully compile itself. </self.some_hidden_hash_table>

Writing a (Ruby) compiler in Ruby bottom up - step 34 2014-04-16