This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.
The C way handles variable length arguments is to let the caller push as much as they'd like onto the stack. It's then up to the callee to make sure they don't access too much, and there's really no way for the C function to know how many arguments have been pushed other than through inspection of the arguments themselves. Not a very pleasant situation. It works because the arguments are pushed right to left, so that the leftmost argument is always at the top of the stack.
This makes variadic functions like printf() reasonably easy to write, but it is also a never-ending source of bugs, since the printf() format string can indicate there are more arguments on the stack than is really the case (most modern C compilers as a result warn about mismatches between printf() format strings and arguments, but this doesn't help you if you write your own. Not great.
Of course, so far we've not really care about safety much, but that doesn't mean I don't want to get there, and in this case it makes it easier to write code too. There are relatively few ways out of this that are not horribly inefficient: A marker that indicate you're at the end of the argument list; a pointer indicating the last position; a count.
None of them are particularly pleasing, but they do the work. Then there's the issue of how to access the extra arguments.
In Ruby it's done by adding a final argument prepended with "*". This works like the "splat" operator in expressions in that it turns anything from then on into an array. In some ways it simplifies matters in that the compiler or interpreter "only" need to know how to stuff the remaining arguments into an array and push a pointer to the array itself onto the stack instead.
We're not quite so lucky as to have a fully formed object system to help us yet. What are the alternatives?
We're going to go with a halfway solution that is pleasing because it allows us to keep interoperability with C:
movl %ebx, %eax subl %ebp, %eax shrl $2, %eax
Before I start going through the code: since I now have a repository at Github, you can see the commit for this code and download the the full code from here:
You can also <"watch" the repository to keep track of whenever I update it.
The biggest change this time is getting the infrastructure in place to handle modifiers to the arguments. The "Arg" class will take care of the arguments from now on:
class Arg attr_reader :name,:rest def initialize name, *modifiers @name = name modifiers.each do |m| @rest = true if m == :rest end end def rest?; @rest; end def type rest? ? :argaddr : :arg end end
The next change we have to make is updating the Function class to instantiate
Arg's, and it's
#get_arg method to handle
:numargs so you can get access to the number of arguments:
class Function attr_reader :args,:body def initialize args,body @body = body @rest = false @args = args.collect do |a| arg = Arg.new(*[a].flatten) @rest = true if arg.rest? arg end end def rest?; @rest; end def get_arg(a) if a == :numargs return [:int,args.size] if !rest? return [:numargs] end args.each_with_index do |arg,i| return [arg.type,i] if arg.name == a end return nil end end
Note that we return a constant :int if :rest isn't used, as a tiny little optimization.
If you look through the rest of the commit on Github you'll see mostly changes to thread the rest? method and `get_arg` changes up into the scope handling. The next interesting bit is this change to LocalVarScope's get_arg:
def get_arg a a = a.to_sym - return [:lvar,@locals[a]] if @locals.include?(a) + return [:lvar,@locals[a] + (rest? ? 1 : 0)] if @locals.include?(a) return @next.get_arg(a) if @next return [:addr,a] # Shouldn't get here normally end
We add one to the offset because we'll later change the emitter to copy
%ebx and onto the stack when inside the function so that it's accessible as a local variable. We don't strictly speaking need to push it on the stack unless we need the register AND actually use
:numargs, but we'll leave avoiding that as an optimization for later.
The next interesting bit is this cleaned up
#compile_eval_arg (I really need to find more meaningful names for these methods... sigh):
def compile_eval_arg scope,arg atype, aparam = get_arg(scope,arg) return aparam if atype == :int return @e.addr_value(aparam) if atype == :strconst case atype when :numargs @e.movl("-4(%ebp)",@e.result_value) when :argaddr: @e.load_arg_address(aparam) ...
(I've omitted the end, as the rest were just cleanups - see the commit).
The new thing here is supporting accessing :numargs, and having a way to get the address of an argument. This is used to get the start of the array of the remaining arguments.
The changes in the emitter are minor:
def load_arg_address(aparam) leal(local_arg(aparam),:eax) end
In a separate commit I decided to do this change to Function#get_arg:
return rest? ? [:lvar,-1],[:int,args.size]
.. combined with stripping the asm for :numargs out of #compile_eval_arg. This change was made for the purpose of removing the asm, but also to treat :numargs more like a variable than a language keyword, though that distinction isn't particularly strong.
Here's a program that demonstrates the new functionality:
require 'compiler' prog = prog = [:do, [:defun, :f, [:test,[:arr, :rest]],[:let,[:i], [:assign, :i, 0], [:while, [:lt,:i,[:sub,:numargs,1]], [:do, [:printf, "test=%ld, i=%ld, numargs=%ld, arr[i]=%ld\n",:test,:i,:numargs,[:index, :arr, :i]], [:assign, :i, [:add, :i, 1]], ] ] ] ], [:defun,:g,[:i,:j],[:let,[:k], [:assign,:k,42], [:printf,"numargs=%ld, i=%ld,j=%ld,k=%ld\n",:numargs,:i,:j,:k] ] ], [:f,123, 42,43,45], [:g,23,67] ] Compiler.new.compile(prog)
... and the expected output:
[[email protected] writing-a-compiler-in-ruby]$ make testargs ruby testargs.rb >testargs.s as -o testargs.o testargs.s testargs.s: Assembler messages: testargs.s:156: Warning: unterminated string; newline inserted cc -c -o runtime.o runtime.c cc testargs.o runtime.o -o testargs [[email protected] writing-a-compiler-in-ruby]$ ./testargs test=123, i=0, numargs=4, arr[i]=42 test=123, i=1, numargs=4, arr[i]=43 test=123, i=2, numargs=4, arr[i]=45 numargs=2, i=23,j=67,k=42 [[email protected] writing-a-compiler-in-ruby]$
To reiterate, the full source is available in this Github repository. To get the source at the state as of the endo f this article, pull or download from this commit