This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.
Where we left off, we had a simple but tolerable register allocator. But as it happens, for now at least the code will not often get all that huge benefits from it, for the simple reason that a huge amount of a typical Ruby application consists of method calls, and most of these method calls causes us to have to evict most of the registers.
However, there is one obvious candidate for the register treatment that we have alreay touched on:
self.
self
is even more ubiquitous in Ruby programs than the source lets on: In addition
to every method call, every access to an instance variable is conceptually either
<self offset>
or <self.some_hidden_hash_table var>
. Every class
variable requires self.class
, and thus self
. And so on.
And when calling a method on another object, self
just temporarily changes - usually the "new self"
will be used a number of times before we return.
So then, this is why I "reserved" a register for caching self
- we'll use %esi
.
I originally intended to include it in the register allocation part, but when that ballooned to 6k words (including source, to be fair), I decided enough was enough.
In comparison, this part will be pleasantly short and simple.
More than just caching self
, we will change the calling convention somewhat:
For now, while we still retain self on the stack (I'm not sure whether I'll keep it that way,
but I'd rather avoid messing with it right now), we also put it in %esi
before we call the
method.
The reasons for doing so are fairly simple:
self == %esi
throughout a method, except immediately surrounding
a method call on another object.self.class
anyway, so we might as well use that load to load self
straight into
%esi
and leave it there for the duration.%esi
is callee saved, we can trust that it won't suddenly change on us in a
method call or function call to external code - when the code returns, %esi
should still
refer to the object we just called a method onSince %esi
is callee saved, it means that we should be able to defer reloading self
into
it after a call until/unless we know it is needed. This will save some reloads, and may also
allow it to be reused on occasion when we call multiple methods on the same object in succession.
I'll leave that optimization for later, though.
There are a few parts to this change:
%esi
is reloaded on method exit; we'll do this in #output_functions
#compile_callm
to load the "new self" for the called method as needed, and reload self
if neededCompiler
and Emitter
as part of refactoring that code%esi
's special status.We'll go through them "from the top" (of the call flow) anyway. You can find all the code for this part in this single commit: 4b190694bc
The only change here is that we want to make sure that %esi
is returned to its original state.
@e.comment("Reloading self if evicted:")
# Ensure %esi is intact on exit, if needed:
reload_self(fscope)
Note that this is slightly broken: In the case of defun
nodes, it will load the global self
,
but for these nodes, %esi
will actually originally contain the calling self
. We'll sort
that out later - for now theres a little workaround at the end of #compile_defun
. This
triggers unnecessary self
reloads, but it works:
@e.evict_regs_for(:self)
reload_self(scope)
The change to compile_callm
is pleasantly small. It only involves the block that's called
back from #compile_callm_args
to handle the actual call, and should be reasonably
self-explanatory (but I'll explain it anyway):
compile_callm_args(scope, ob, args) do
Here, if we're calling a method on an object named anything other than self
, we load
the address of the object from the top of the stack, where it has been stuffed as part
of preparing the arguments anyway. Do note that this is in many cases going to be wasteful:
If the method only has static arguments (or no arguments), we can safely load it into %esi
first, and then put it onto the stack from %esi
, potentially (unless the object is
already in a register) saving a memory load. That's one of many low hanging fruits for
optimization later on.
if ob != :self
@e.load_indirect(@e.sp, :esi)
If, on the other hand, we called a method on self
, we "reload" self
if needed.
else
@e.comment("Reload self?")
reload_self(scope)
end
#reload_self
is just a tiny helper that calls #get_arg(scope,:self)
for now,
and relies on the register allocation in #get_arg
to reload %esi
with self
if it has been evicted.
And here's our cleaned up code to load self.class, and call the right vtable offset.
load_class(scope) # Load self.class into %eax
@e.callm(off)
Finally, if the object wasn't self
, we ensure self
is evicted from %esi
so that
it will be reloaded later if needed.
if ob != :self
@e.comment("Evicting self")
@e.evict_regs_for(:self)
end
end
The only other change in Compiler
is to evict self
when starting a new scope for
classes - see the start of #compile_class
For the emitter, I've added more debug output as comments to the code - I'll ignore those changes, you can find them in the commit.
Other than that, I've made the following changes.
First, a new #callm
method that moves the offset calculations etc. out of Compiler
,
where it didn't belong:
# Call an entry in a vtable, by default held in %eax
def callm(off, reg=:eax)
@allocator.evict_caller_saved
emit(:call, "*#{off*Emitter::PTR_SIZE}(%#{reg.to_s})")
end
Other than that, I've just added a single line to Emitter#func
:
@allocator.cache_reg!(:self)
This ensures that the compiler treats %esi
as already holding self
when entering
a method.
To the register allocator there are a few more changes.
First I've added the name of the cached variable to the cache. I've also added an
instance variable holding the register set aside for self
, and started refactoring
to handle the caller saved variables separately. See #evict_by_cache
and #evict_vars
for that change.
The main changes, though, are to give self
special treatment as a variable with its own
special register that won't otherwise be handed out.
This is mainly handled here in #cache_reg!
:
@@ -174,13 +191,18 @@ class RegisterAllocator
# available, the emitter will use its scratch register (e.g. %eax)
# instead
def cache_reg!(var)
- if !@order.member?(var.to_sym)
+ var = var.to_sym
+ if var == :self
+ free = @selfreg
+ elsif !@order.member?(var)
return nil
+ else
+ free = @free_registers.shift
end
- free = @free_registers.shift
+
Let us finally take a look at what some of our now slightly less horrible code looks like.
This is one of our test cases, found in features/inputs/method2.rb
:
class Foo
def initialize
end
def foo
puts "foo"
end
def bar arg
puts arg
end
end
f = Foo.new
f.foo
f.bar("bar")
And here is the asm output for Foo#bar
. You can see the copious comments
that the register allocation emits, but I've added some extra commentary
inline anyway:
__method_Foo_bar:
.stabn 68,0,12,.LM463 -.LFBB126
.LM463:
.LFBB126:
At this point %esi
holds self
, but as a "backup" [:arg,0]
in the compiler, which is mapped at 8(%ebp)
, also holds %esi
.
Though in this case, we luckily won't need it, as you can see
self
remains in %esi
throughout:
pushl %ebp
movl %esp, %ebp
subl $20, %esp
# callm :self.:puts
subl $12, %esp
movl $3, %ebx
pushl %ebx
At this point, the compiler is trying to push self
onto the
stack, and when it checks the register cache, it tells us self
is already in %esi
, courtesy of explicitly marking it as
cached in Emitter#func
. it then proceeds to fill in the arguments:
# RA: Already cached 'esi' for self
movl %esi, 4(%esp)
movl $0, 8(%esp)
# RA: Allocated reg 'ecx' for arg
# [:arg, 2, :ecx]
movl 16(%ebp), %ecx
movl %ecx, 12(%esp)
popl %ebx
Here it is checking if it needs to reload self
into the register.
If the arguments had contained expressions that included calls to
methods of other objects, that might have been necessary. But not
this time. So it loads self.class
via an indirect load from (%esi)
(as a reminder, if esi
and eax
was C variables, movl (%esi), %eax)
is equivalent to eax = esi[0]
or eax = *esi
). Then it does
the method call, via the vtable - the address of #puts
have happened
to get allocated to slot 36.
# Reload self?
# RA: Already cached 'esi' for self
movl (%esi), %eax
call *36(%eax)
addl $12, %esp
# callm self.puts END
addl $20, %esp
Finally, we need to check if there's a chance %esi
has been evicted
but not reloaded yet, so we can ensure %esi
has the right value when
we return back up to the calling method. In this case, %esi
already
holds the right thing:
# Reloading self if evicted:
# RA: Already cached 'esi' for self
leave
ret
Let's take a look at another example too, namely the start of Object#puts
(note that Object
is not the right place for #puts
- it is there temporarily
because we don't support modules yet):
__method_Object_puts:
.stabn 68,0,36,.LM347 -.LFBB65
.LM347:
.LFBB65:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $36, %esp
.stabn 68,0,37,.LM348 -.LFBB65
.LM348:
subl $4, %esp
movl $1, %ebx
# RA: Allocated reg 'edi' for numargs
# [:lvar, -1, :edi]
movl -4(%ebp), %edi
movl %edi, (%esp)
movl $__get_fixnum, %eax
call *%eax
addl $4, %esp
Here we see what currently happens when we call a function defined with %s(defun ...)
.
Because I've been too lazy to ensure it doesn't load the "wrong" self
before exit,
I added the workaround to forcibly reload self
after the call, and that is what is
happening here. It's wasteful, so I'll address this soon:
# RA: Evicted self
# RA: Allocated reg 'esi' for self
# [:arg, 0, :esi]
movl 8(%ebp), %esi
Here comes a call to #==
, which gets a double whammy, as the other argument is a
literal 2
, and so triggers a call to __get_fixnum
:
movl %eax, -20(%ebp)
.stabn 68,0,39,.LM349 -.LFBB65
.LM349:
# callm :na.:==
subl $12, %esp
movl $3, %ebx
pushl %ebx
# RA: Allocated reg 'ecx' for na
# [:lvar, 3, :ecx]
movl -20(%ebp), %ecx
movl %ecx, 4(%esp)
movl $0, 8(%esp)
subl $4, %esp
movl $1, %ebx
movl $2, (%esp)
movl $__get_fixnum, %eax
call *%eax
addl $4, %esp
# RA: Evicted self
# RA: Allocated reg 'esi' for self
# [:arg, 0, :esi]
movl 8(%ebp), %esi
movl %eax, 12(%esp)
This code is still far from optimal. Ignoring all the other problems, in terms of
the self
caching, we evict self
on return from a call
node just because
the current code doesn't maintain the calling convention properly. But after
the previous part, I want to keep this short and focused, so lets leave it
here for now.
... we will revisit an old goal: Self-hosting.
First I'll land a number of changes that will make the parser able to parse the entire compiler (whether or not it parses it correctly or not remains to be seen, and is outside the scope).
Secondly, we'll start looking at small parts of the compiler to break into pieces that are:
The goal is to lay the ground work for refactoring what we now have into smaller units that at the same time both allow us to unit test the module and use it as a test case for the compiler at the same time. Each module we can compile will then both give increased confidence in the generated code and bring us one step closer to the state where the compiler can fully compile itself. </self.some_hidden_hash_table>