This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.
It took me two months to push this out, but I promise it doesn't mean I'm about to stop publishing these again anytime soon. I intend to shorten my "lead time" further again, so I'm hoping to publish the next one in 2-3 weeks time.
We saw in some of the recent parts that the code is getting more and more prone to problems due to attempts at reusing registers.
Register allocation is the proper solution to this. That is, to introduce a mechanism for determining what code gets to use which registers when, and that handles the situation where too few registers are available properly (by "spilling" registers onto the stack)
Let me first make it clear that the really naive alternative to this is to avoid register allocation entirely, and to declare a few registers to be "scratch" registers that can only be safely used within code where no code generation is delegated further down.
This is possible to do by pushing all intermediate results onto the stack, and popping them off only briefly to carry out necessary operations and push them back onto the stack again (optionally with simple optimizations to operate directly on the top of the stack for single operand operations or when possible with the last of multiple operand instructions).
Basically, this means you're "emulating" a stack machine with registers.
This is an approach taken by many simple compilers for unoptimized code, deferring the problem of register allocation to optimization passes.
In reality we're very close to that, and in some respects I wish I'd kept purely to that approach, not least because it took me two months to get time to wrap up this part properly. But on the other hand, doing very basic register allocation is not that hard, and since we've started down that path we might as well go a bit further.
Only a bit mind you.
We will in fact in this part refactor some code to need register allocation less, to the point where we can get away with only minor adjustments to the original really primitive allocator.
Optimal register allocation is hard, and tends to require analysis that will require or at least strongly favor a solution with a low level intermediate representation that allows more in-depth analysis of register usage.
A first step is generally to analyse "liveness" of each variable, including temporary variables introduced by the compiler. Then the problem basically boils down to finding a way to map as many variable references as possible to registers so that you don't exceed the available number of registers for your architecture at any one point.
The "at any one point" part adds complexity, since reuse of registers is central to minimizing the number of memory accesses, which is the main goal of register allocation:
For most systems, memory accesses are substantially slower than register access (though this is less of an issue in these days of multi-level processor caches than in the old days where every memory access when to external memory; when you fail to hit cache, though, you are far more brutally penalized on modern CPUs as the gap between CPU performance and memory latency has widened)
If you have n registers, and no more than n variables are "live" at once, then you have it easy if you are able to determine wich variables are "live".
The fun begins when (and on an architecture like i386, this is likely to be pretty much all the time) the code block you are allocating registers for is referencing more stuff than you have registers for.
This either means "spilling" the contents of a register to memory in the middle of a function in order to load another variable, or keeping some variables you'd like to have in a register in memory instead, other than when operating on it.
That's where the challenge arises.
(though Ruby reduces this problem through other inefficiences: Almost everything is accessed via methods, which reduces the number of variables that are easy to put into registers; we may hope to improve on that later, but for now it reduces the immediate benefit of register allocation)
Another typical approach is graph coloring, which I'm not going to go into at all, as it's slow and complicated. A more recent approach is Linear Scan allocators, covered on the Wikipedia page for register allocation, which is popular for JIT's etc because it is in general a lot faster (and simpler) at the cost of some performance in the generated code.
In any case, down the rabbit hole you go...
I have no desire to deal with that anytime soon, and especially not on an architecture like i386 where the number of registers is so small. On, say, x86-64, or M68k it'd be a different matter, as you have vastly better chances of being able to keep a substantial proportion of intermediate results in registers and largely avoid spilling the results to the stack, and so you're a real sucker if you don't take maximal advantage of the registers (however, as mentioned above, without other work, we'd have problems taking full advantage).
(translation: I'll have to deal with this some day, but not now)
What we will do for now is to mostly rely on keeping stuff in memory (including on
the stack), except for short term loading stuff into %eax
when operating on it.
As needed we'll also use %edx
(see below). Beyond that, we will just guess wildly
at some heuristics...
Apart from the guessing wildly at some heuristics part, this is roughly what we've done so far anyway...
But so far we've done the register allocation extremely ad-hoc, so it is worth first establishing our actual conventions. We start with the C calling convention, since we're trying to allow our code to call C code directly:
We roughly follow the cdecl calling convention for i386.
That is, we will aim for %eax
, %ecx
and %edx
to be caller saved. That is,
if we rely on them to maintain their values, we will push them onto the stack
before calling another method.
In practice we will treat %eax
as a scratch register that is only used for short
windows, and so we will rarely if ever want to bother to save it, especially as it
is also generally used as the return value anyway.
Everything else is to be callee saved (but see caveat below); that is, if they are used in the called method, they will be saved there, and needs to be back to their original value by the time the method returns.
And we store arguments on the stack.
Next we add on some extensions:
Specifically we store the number of arguments in %ebx
as methods needs to be
able to determine the precise number of arguments whether due to "splats"
or to throw appropriate exceptions to enforce arity (ensure that the number
of arguments matches the declared number of arguments for the method; we
don't do this yet).
Note that this breaks the "cdecl" calling convention. However this is only a
problem if C code (or other code conforming to the cdecl
calling convention)
calls into our code.
For now we're not sticking our head into that worms nest (though you need to
be aware of this if you want to experiment with the code and use C-functions
like qsort
which uses callbacks), so it's a non-issue (if/when we do, we'll
either need to consider preserving %ebx
, or more likely we'll need to
bridge calls, as the method will likely require %ebx
to hold the number of
arguments for more than just the splats to work (e.g. Ruby methods are expected
to throw ArgumentError
if called with the wrong number of arguments, which
will eventually force us to add additional checks at the start of each method).
As a result, %ebx
is in practice free other than on entering a function, as
we save it to the local stack frame. I'm of two minds whether to keep passing
the number of arguments this way, as the code has gotten to the point where we
end up temporarily pushing it on the stack anyway on call, and pushing it on
the stack again afterwards. The only real current benefit of using a register
for it is that we can directly call C functions that are totally unaware of
how we pass the number of arguments.
These C functions don't know about the "fake" numargs
"variable" anyway,
and so we could as an alternative introduce a way of indicating we want C
calling convention that would just leave out numargs entirely. It'd be
slightly more hassle to use, but it'd only ever be used for low level %s()
code anyway.
We're not going to change this right now, regardless.
Well, on i386 we have %ecx
,%edx
,%esi
, %edi
. The "problem" is that all
of them have special purposes in some cases. We've run into one of them: %edx
is used to hold the high 32 bits of the dividend for idivl
, and the remainder
of the result, and needs special treatment as a result. There's also a number
of floating point registers.
We also have %ebp
which we use for the frame pointer to access local variables
and method arguments via, and %esp
, the stack pointer.
As discussed above, we could in theory use %ebx
, with the caveat that while the
splat handling is as is is, it'll get clobbered often, so for now I've kept it out.
(There are many other register names on i386, but many of them alias 16- and 8-bit portions of the 32-bit registers referred to above; as someone who grew up with M68k asm, where this kind of thing did not occur, in general I'll pretend I never heard about that wart)
As I said, we'll guess wildly.
As a first approximation, we'll make %esi
hold self
from the first
reference to self in each method. We'll cover the implementation of that in
the next part, so for now we will ignore %esi
. This is because method calls
and instance variable access both require self
, and so we can assume it
will be quite frequent, and as I will show next time, this pans out quite
nicely.
I toyed with the idea of making %edi
hold self.class
, based on the
same assumption, but this has two problems: It reduces the set of registers
available to us further, and it complicates handling of eigenclasses:
The "internal" inheritance chain for Ruby objects can actually change at
nearly any time, if someone decides to extend the object with methods
tied to the objects meta class / eigenclass. If we cache self.class
that limits us in how to handle that further down the line. So for now
I won't do anything about that.
This leaves us with %ecx
, %edx
and %edi
free for allocation so far. We
further allow "forced" allcation of specific registers in order to handle
cases like idivl
metioned earlier.
At the same time we'll make some small changes that has the effect of minimizing the amount of time temporary values are kept in registers that has the effect of ensuring we never need many of them.
Then we'll add a "quick and dirty scan" step that will simply create a set of candidate variables as follows:
Note that at this point we could go the next step and store information about liveness ranges, and use that to allow register usage to overlap, but one step at a time. The main thing is to get the overall mechanism in place.
As little as possible. On accesing any local variables or arguments, we
check if they're in a register first, if not we try to load it into a free
register if there is one, and if not we fall back on the current behaviour of
using %eax
(and reloading / saving the variable on each access).
This totally sidesteps the need for liveness analysis, but also means we don't get the full benefit, and it's easy to create pathological cases where it performs really badly (e.g. lots of variables accessed few times, used one after the other, and then once you've referenced enough variables, reference them all again, right after they've been evicted from the registers; repeat - in this way you can construct cases where you get no benefit from the register allocation at all, while if you'd kept a subset of the variables in the registers throughout the code block, you would). But this is still no worse than the code we currently generate.
A number of methods surrently refer to registers through symbols or even strings
that directly reference them. A first step is to try to abstract away more of
this so that we as much as possible request registers from the Emitter
so that
the registers can be replaced later if possible/necessary.
As a bonus, this will set us on the right path to make the compiler easier to retarget to other architectures.
This first changeset adds methods for the stack pointer, for using %ebx
as
scratch register, and a couple of other helpers, as well as start cleaning up
the Compiler
class to not explicitly mention registers by name so
many places. You can see it in 92aa83f - I won't cover that commit in
more detail.
We already have with_register
that sort-of tried to do a little bit of very
basic register allocation. But with_register
comes with substantial
caveats: It knows nothing about what to do if it runs out of registers to
allocate. So lets try to provoke it to fail, so we have something to fix to
start with:
%s(printf "%d = 2\n" (div (div (div 16 2) 2) 2))
I picked div
as our recent rewrite away from runtime.c
left us with code
in compile_div
that does dual nested with_register
calls already, so it's
well suited to ensure we run out of registers in our current regime.
You should get something like this if you try to compile it:
./emitter.rb:364:in `with_register': Register allocation FAILED (RuntimeError)
The problem is simply that we've only set aside %edx
and %ecx
, and
furthermore we're not doing anything to allow spilling the values to the
stack, or even to make them available again when caling into another
function.
Triggering this in actual Ruby code is harder, as almost everything quickly devolves into method calls that effectively "reset" the register allocations by virtue of the calling conventions requirements for who saves which registers.
But lets sort this out first of all anyway.
First, since this revealed the danger of the nested with_register
calls,
which will remain inefficient at best, lets address the simpler case of
compile_2
which we've used for dual-operand arithmetic and comparisons:
(You'll find these changes and the div test case in 0fdfb34)
def compile_2(scope, left, right)
src = compile_eval_arg(scope,left)
@e.with_register do |reg|
@e.movl(src,reg)
@e.save_result(compile_eval_arg(scope,right))
yield reg
end
[:subexpr]
end
Just a tiny change here: We've lifted the compile_eval_arg(scope,left)
call out. Frankly
we should probably remove the need fully, and just use the stack. I'll consider that
later. In the meantime this reduces register usage substantially in cases where the left-hand
expression tree is deep.
The bigger change is compile_div
:
def compile_div(scope, left, right)
@e.pushl(compile_eval_arg(scope,left))
res = compile_eval_arg(scope,right)
@e.with_register(:edx) do |dividend|
@e.with_register do |divisor|
@e.movl(res,divisor)
# We need the dividend in %eax *and* sign extended into %edx, so
# it doesn't matter which one of them we pop it into:
@e.popl(@e.result)
@e.movl(@e.result, dividend)
@e.sarl(31, dividend)
@e.idivl(divisor)
end
end
[:subexpr]
end
This is pretty much a rewrite - the old version was horribly messy and brittle, because
it tried to work around not being able to tell with_register
exactly what it wanted.
So to simplify, lets make with_register
handle that.
And at the same time we lift the argument evaluation out of the actual division, which massively simplifies things:
First we evaluate the left, and push the result onto the stack. Then we evaluate the right, and leave the result.
Then we forcibly allocate %edx
since we know idivl
needs it for both arguments and return
value. Then we allocate a register for the divisor (right hand).
We then get the dividend (left hand) off the stack, into %eax
, move it into the allocated
register, which will be %edx
. We then shift right to only leave the sign bit (see last time
we dicussed the divisions for more on this).
Finally we do the division.
You may note that this is going to be wasteful in cases where the left hand expression is a static value etc. that we could just load directly into the right register. Adding some code to detect that will save some stack manipulation. But that's not a priority now.
The corresponding change to emitter.rb
to expand with_register
looks like this for now:
- def with_register
+ def with_register(required_reg = nil)
# FIXME: This is a hack - for now we just hand out :edx or :ecx
# and we don't handle spills.
@allocated_registers ||= Set.new
- free = @free_registers.shift
+
+ if required_reg
+ free = @free_registers.delete(required_reg)
+ else
+ free = @free_registers.shift
+ end
+
In other words, this will still blow up spectacularly if we need more registers.
Note that that the changes we've done means the register allocation now only surrounds a handful of instructions doing calculation, and no further calls.
Luckily this means that in reality we shouldn't need more. We'd like more, but most of our register usage will only be for caching variables we've allocated space in memory for.
As such, we know the worst case for the register allocation is that we need to
juggle %edx
around, as that's the only register we specifically need (so far) for
idivl
.
As "stage 1" this gets our previous example to compile and run.
For now we'll move on to the meatier part of register allocation: Moving variables into registers.
You're not going to like this part. It is hairy and ugly, and involves changing
#rewrite_let_env
and find_vars
in transform.rb
, which are not exactly the prettiest
part of the compiler to begin with.
The relevant part of rewrite_let_env
is the simplest:
# We use this to assign registers
freq = Hash.new(0)
vars,env= find_vars(e[3],scopes,Set.new, freq)
We set aside a Hash
with 0 as the default value, to keep count of the variable references
we see.
Then we assign the frequency data to a new extra
field of the AST nodes:
e[3].extra[:varfreq] = freq.sort_by {|k,v| -v }.collect{|a| a.first }
extra
is simply adding this to AST::Expr
(in afdcd99):
+
+ def extra
+ @extra ||= {}
+ end
Back to transform.rb
, the changes to find_vars
are more extensive, but most of
them are about threading freq
through the recursive calls, and some refactoring
(especially breaking out #is_special_name
. The meat of it is this little change:
elsif n.is_a?(Symbol)
sc = in_scopes(scopes,n)
freq[n] += 1 if !is_special_name?(n)
Specifically, when we get to a "leaf", if it is not a "special" name as defined in the
new utility method, we count it as a variable reference to take into account for the
later allocation. You can find the full changes to transform.rb
here: 3f2ab88
I've split out the "backend" of the new register allocation code in regalloc.rb
in
608adc1.
It could probably do with some refactoring already, but that will have to wait.
Most of the explanations here can be found in the source as well.
The RegisterAllocator
class contains the Cache
class which is used to hold
information about the current state of a register that is currently used to hold
a variable.
class Cache
# The register this cache entry is for
attr_accessor :reg
# The block to call to spill this register if the register is dirty (nil if the register is not dirty)
attr_accessor :spill
# True if this cache item can not be evicted because the register is in use.
attr_accessor :locked
def initialize reg
@reg = reg.to_sym
end
# Spill the (presumed dirty) value in the register in question,
# and assume that the register is no longer dirty. @spill is presumed
# to contain a block / Proc / object responding to #call that will
# handle the spill.
def spill!
@spill.call if @spill
@spill = nil
end
end
The RegisterAllocator
class itself is initialized like this:
def initialize
# This is the list of registers that are available to be allocated
@registers = [:edx,:ecx, :edi]
# Initially, all registers start out as free.
@free_registers = @registers.dup
@cached = {}
@by_reg = {} # Cache information, by register
@order = {}
end
The @cached
and @by_reg
hashes contains the Cache
objects referenced
by variable name and register respectively. @order
gets assigned an array
of variables in descending order of priority. As previously discussed, this
in effect means by descending order of number of times the variable is referenced
at this point.
Additionally, #with_register
initializes two more instance variables as needed:
@allocated_registers
and @allocators
, which holds information on which registers
have been allocated for temporary use, and debug information about the code that
called #with_register
respectively.
I won't go through every little method - you can look at the commit (and feel
free to ask questions if anything is not clear), but we'll take a look at
#cache_reg!
which is used to cache a register, as well as evict
which is
used to remove a variable from the registers (that is, remove the association
in the compiler, we do not generate code to actually clear the register), and
#with_register
, which is used to allocate temporary registers.
#cache_reg!
will only allocate registers for variables that have been "registered"
with the register allocator. First we try to obtain a register from the list
of free/unallocated/uncached registers:
def cache_reg!(var)
if !@order.member?(var.to_sym)
return nil
end
free = @free_registers.shift
if free
debug_is_register?(free)
If a register is found, we create a new Cache
object, referencing the register.
If not, we (for now) output a warning. If a register is found, it is returned,
otherwise, we return nil
.
c = Cache.new(free)
@cached[var.to_sym] = c
@by_reg[free] = c
else
STDERR.puts "NO FREE REGISTER (consider evicting if more important var?)"
end
free
end
We'll look at how this is used later.
evict
has the opposite role: When we need to ensure that a variable is retrieved
from memory next time, we pass it.
We iterate over the variables passed, and try to delete them from the cache. If they
were in fact there, we call the spill handler if one was set. The spill handler is
any object (such as a Proc or lambda) that responds to #call
, and it is its responsibility
to store the contents of the register back to memory before it is freed.
This is only assigned if the in-register version of the variable is intentionally modified.
We then add the register back in the list of free regsiters.
For convenience, I've added an #evict_all
method that evicts all currently cached
variables.
def evict vars
Array(vars).collect do |v|
cache = @cached.delete(v.to_sym)
if cache
r = cache.reg
debug_is_register?(r)
cache.spill!
@by_reg.delete(r)
@free_registers << r
end
r ? v : nil
end.compact
end
def evict_all
evict(@cached.keys)
end
The longest method in the register allocator for now is #with_register
.
This has been lifted from Emitter
and adapted, and in fact we've covered
changes to that one above, but I'll go through it from scratch.
First we check if the client has requested a specific
register. If so,
we try to retrieve this register. If not, we get the first one available.
This is the change we did above to the Emitter
version.
If none was immediately vailable, we go through the variables cached from least frequently used (in this method), to most, and evict the first one that is not "locked" in place.
(Note, there's a potential problem here: The client code does need to
be able to handle the case where the preferred register is already taken,
as we currently don't go on to evict the variable specifically allocated
to that register, but I punted on that when changing the version previously
in Emitter
- we probably should fix that, but now now)
# Allocate a temporary register. If specified, we try to allocate
# a specific register.
def with_register(required_reg = nil)
@allocated_registers ||= Set.new
if required_reg
free = @free_registers.delete(required_reg)
else
free = @free_registers.shift
end
if !free
# If no register was immediately free, but one or more
# registers is in use as cache, see if we can evict one of
# them.
if !@cached.empty?
# Figure out which register to drop, based on order.
# (least frequently used variable evicted first)
@order.reverse.each do |v|
c = @cached[v]
if c && !c.locked
reg = c.reg
evict(v)
free = reg
break
end
end
end
end
The next part is simply debug output, primarily in case we actually run out of registers, which means we're trying to use more temporary registers at one time than the allocator has registers available total (3 currently). It should not happen if we are careful about what we allocate temporary registers for:
debug_is_register?(free)
if !free
# This really should not happen, unless we are
# very careless about #with_register blocks.
STDERR.puts "==="
STDERR.puts @cached.inspect
STDERR.puts "--"
STDERR.puts @free_registers.inspect
STDERR.puts @allocators.inspect
raise "Register allocation FAILED"
end
And some more debug support:
# This is for debugging of the allocator - we store
# a backtrace of where in the compiler the allocation
# was attempted from in case we run out of registers
# (which we shouldn't)
@allocators ||= []
@allocators << caller
Finally, we mark the register as allocated, yield to the client code, and free the register again.
# Mark the register as allocated, to prevent it from
# being reused.
@allocated_registers << free
yield(free)
# ... and clean up afterwards:
@allocators.pop
@allocated_registers.delete(free)
debug_is_register?(free)
@free_registers << free
end
The changes to compiler.rb
are fairly minor. We will go through the changes to
each of get_arg
, output_functions
, compile_if
, compile_let
and compile_assign
from e1049e3 separately, and intersperse that with how it ties in with the changes
to Emitter
in the same commit.
Lets start with #get_arg
:
@@ -86,7 +86,7 @@ class Compiler
# If a Fixnum is given, it's an int -> [:int, a]
# If it's a Symbol, its a variable identifier and needs to be looked up within the given scope.
# Otherwise, we assume it's a string constant and treat it like one.
- def get_arg(scope, a)
+ def get_arg(scope, a, save = false)
return compile_exp(scope, a) if a.is_a?(Array)
return [:int, a] if (a.is_a?(Fixnum))
return [:int, a.to_i] if (a.is_a?(Float)) # FIXME: uh. yes. This is a temporary hack
@@ -94,7 +94,23 @@ class Compiler
if (a.is_a?(Symbol))
name = a.to_s
return intern(scope,name.rest) if name[0] == ?:
- return scope.get_arg(a)
+
+ arg = scope.get_arg(a)
+
+ # If this is a local variable or argument, we either
+ # obtain the argument it is cached in, or we cache it
+ # if possible. If we are calling #get_arg to get
+ # a target to *save* a value to (assignment), we need
+ # to mark it as dirty to ensure we save it back to memory
+ # (spill it) if we need to evict the value from the
+ # register to use it for something else.
+
+ if arg.first == :lvar || arg.first == :arg
+ reg = @e.cache_reg!(name, arg.first, arg.last, save)
+ return [:reg,reg] if reg
+ end
+
+ return arg
end
The comment says almost all that needs to be said. The main thing
to notice here is the "save" argument, which we'll see used later by
compile_assign
. Let's take a look at Emitter#cache_reg!
.
First we see if this variable is currently cached in a register.
If save
was passed, we mark this entry as dirty, so that if the
variable is later evicted from the register, we spill the value
back to memory. Regardless whether or not we got a register back,
we return, as we certainly don't want to load the value into
memory just to overwrite it and then spill it later.
(Note that we could add code to request a register and fill it with the modified value, and immediately mark it dirty; in some cases this might be worthwhile by potentially saving us a load later on, but that's speculative enough that I'd want to do real tests first)
def cache_reg!(var, atype, aparam, save = false)
reg = @allocator.cached_reg(var)
if (save)
mark_dirty(var, atype, aparam) if reg
return reg
end
Then we output some comments for debugging purposes and to make it easier for us to examine the results of the allocation later on (we might strip this out later), and if the register was not already in the cache, we try to request a register to cache it in, and if we get one we load it:
comment("RA: Already cached '#{reg.to_s}' for #{var}") if reg
return reg if reg
reg = @allocator.cache_reg!(var)
return nil if !reg
comment("RA: Allocated reg '#{reg.to_s}' for #{var}") if reg
comment([atype,aparam,reg].inspect)
load(atype,aparam,reg)
return reg
end
Let us also take a quick look at Emitter#mark_dirty
:
The most important part here is the lambda that is used installed to handle spills. It simply outputs another debug comment, and saves the register back where it came from:
def mark_dirty(var, type, src)
reg = cached_reg(var)
return if !reg
comment("Marked #{reg} dirty (#{type.to_s},#{src.to_s})")
@allocator.mark_dirty(reg, lambda do
comment("Saving #{reg} to #{type.to_s},#{src.to_s}")
save(type, reg, src)
end)
end
As for Compiler#output_functions
, the only big change there is that it
now passes the variable frequency information:
varfreq = func.body.respond_to?(:extra) ? func.body.extra[:varfreq] : []
@e.func(name, func.rest?, pos, varfreq) do
So lets see what Emitter#func
does with it. It ensures all registers are
evicted before we generate code for a new function, as the function obviously
can't control where it is called from. We then install the new frequency information
in the register allocator. And on the way out again, we evict the registers again,
for good measure. Actually the latter one is necessary in case any of the registers needs
to be spilled. The former one is a precaution - the registers ought to have been evicted
before we get there.
- def func(name, save_numargs = false, position = nil)
+ def func(name, save_numargs = false, position = nil,varfreq= nil)
@out.emit(".stabs \"#{name}:F(0,0)\",36,0,0,#{name}")
export(name, :function) if name.to_s[0] != ?.
label(name)
@@ -479,12 +518,17 @@ class Emitter
lineno(position) if position
@out.label(".LFBB#{@curfunc}")
+ @allocator.evict_all
+ @allocator.order(varfreq)
pushl(:ebp)
movl(:esp, :ebp)
pushl(:ebx) if save_numargs
yield
leave
ret
+
+ @allocator.evict_all
+
emit(".size", name.to_s, ".-#{name}")
@scopenum ||= 0
@scopenum += 1
@@ -494,6 +538,7 @@ class Emitter
end
The change to Compiler#compile_if
is simple: We simply need to explicitly pass
the register to test, rather than rely on it always being %eax
:
def compile_if(scope, cond, if_arm, else_arm = nil)
res = compile_eval_arg(scope, cond)
l_else_arm = @e.get_local
l_end_if_arm = @e.get_local
@e.jmp_on_false(l_else_arm, res)
compile_eval_arg(scope, if_arm)
@e.jmp(l_end_if_arm) if else_arm
@e.local(l_else_arm)
compile_eval_arg(scope, else_arm) if else_arm
@e.local(l_end_if_arm) if else_arm
return [:subexpr]
end
In compile_let
the only change is that we want to ensure we evict all registers
that the %s(let ...)
node aliases, as otherwise we will be using values from the
wrong variable:
@e.evict_regs_for(varlist)
@e.with_local(vars.size) { compile_do(ls, *args) }
@e.evict_regs_for(varlist)
In compile_assign
, our only concern is passing a truthy value for "save":
args = get_arg(scope,left,:save)
We'll do one tiny little additional change, and then we'll look at the resulting code.
In e74d92d we introduce Emitter#with_register_for
:
def with_register_for(maybe_reg)
c = @allocator.lock_reg(maybe_reg)
if c
comment("Locked register #{c.reg}")
r = yield c.reg
comment("Unlocked register #{c.reg}")
c.locked = false
return r
end
with_register {|r| emit(:movl, maybe_reg, r); yield(r) }
end
The purpose of this is as a small optimization in cases where we need a temporary register to hold a variable, but already have the variable in a register. We need to ensure it doesn't get evicted, same as if we allocate a temporary register.
And here's the only place we use it so far:
def compile_2(scope, left, right)
src = compile_eval_arg(scope,left)
@e.with_register_for(src) do |reg|
# @e.emit(:movl, src, reg)
@e.save_result(compile_eval_arg(scope,right))
yield reg
end
[:subexpr]
end
}}}
If the variable is already in a register, it can save us a movl.
## A quick look at the resulting code ##
Let us compile the example code from earlier. It should give something like this:
-asm-
__method_Object_foo:
.stabn 68,0,2,.LM336 -.LFBB61
.LM336:
.LFBB61:
pushl %ebp
movl %esp, %ebp
subl $36, %esp
.stabn 68,0,3,.LM337 -.LFBB61
.LM337:
subl $20, %esp
movl $5, -20(%ebp)
movl $2, -8(%ebp)
movl $5, -12(%ebp)
movl $10, -16(%ebp)
# RA: Allocated reg 'edx' for a
# [:lvar, 0, :edx]
movl -8(%ebp), %edx
# Locked register edx
# RA: Allocated reg 'ecx' for b
# [:lvar, 1, :ecx]
movl -12(%ebp), %ecx
Here we see the first uses, and as you can see from the "Locked" comment above, this also made use of the optimization where we'd previously have allocated another register and moved or reloaded the variable:
movl %ecx, %eax
addl %edx, %eax
# Unlocked register edx
And here we're assigning the result of (add a b)
back to a
, which
currently lives in %edx
. As a result it is marked "dirty": It needs
to be written back to memory when evicted.
# Marked edx dirty (lvar,0)
movl %eax, %edx
# RA: Already cached 'edx' for a
# Locked register edx
# RA: Allocated reg 'edi' for c
# [:lvar, 2, :edi]
movl -16(%ebp), %edi
And we use a
again, from %edx
and start reaping the
rewards:
movl %edi, %eax
addl %edx, %eax
# Unlocked register edx
# Marked edx dirty (lvar,0)
movl %eax, %edx
Of course, note above, that we could save much more with smarter
handling of these registers - in these examples we could have done
addl %edi, %edx
directly, and saved two further movl
's - we have
tons of further optimizations to do.
And here are some more examples where we reuse a
# RA: Already cached 'edx' for a
# Locked register edx
movl $2, %eax
imull %edx, %eax
# Unlocked register edx
# Marked edx dirty (lvar,0)
movl %eax, %edx
subl $8, %esp
movl $2, %ebx
movl $.L83, %eax
movl %eax, (%esp)
# RA: Already cached 'edx' for a
movl %edx, 4(%esp)
movl $printf, %eax
And here we finally spill a
back to memory from %edx
, right before
we call printf
, as %edx
can be overwritten:
# Saving edx to lvar,0
movl %edx, -8(%ebp)
call *%eax
(Incidentally, this is where liveness analysis makes a big difference:
after the last use of a
, it'll still get spilled, but that's of course
pointless since this is a local variable)
I'd like to reiterate that this is a trivial and primitive allocator. It misses tons of opportunities, and may do stupid things, like load stuff, use it once, have to evict it, load it again, use it once, have to evict it, and so on.
The important thing, though, is to get some basic infrastructure in place that we can expand on. We can now later add more advanced logic to determine which variables to cache when with much less effort.
Next time, we'll look at another side of this: Caching self
in %esi
, which
we'll handle quite differently (and in a much shorter part...)