This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.
I promised to get a second part out in December somewhere, and it seems like I only barely made it. I'm still hoping to push two out in January, but that's contingent on actually getting time to finish the next one too. But there'll be at least one in January, sometimes towards the middle of the month. Anyway...
With the new, though basic, Fixnum
support, it makes sense to want to print the numbers
without having to resort to %s()
. Let us add some basic output and conversion functionality.
First lets make compiler.feature
actually run again. This turned out to be an embarrassing
case of me forgetting to check in two files. With 66d56ef you can run these tests again.
Even more embarrassingly that reveals a few regressions - a motivation for the next part, where we will discuss testing og code generation in more detail, since that's definitively a current weak spot.
We will see soon to what extent these regressions affect what comes next, though.
For now we will only support Kernel#puts
and Kernel#print
, and only the simplest of use
cases, though we will actually put them in Object
, again due to lack of include
support.
We will assume that the output field separator ($,
) is always a linefeed, and we will not
do anything to explicitly support array arguments.
We add this test case to compiler.feature
:
puts
puts "Should be one line down due to linefeed above"
print "Hello "
puts 42
print "World\n"
print "This","is",1,"test\n"
puts "This","is","test",2
We hope to see at least part of this:
$ ruby features/inputs/06print.rb
Should be one line down due to linefeed above
Hello 42
World
Thisis1test
This
is
test
2
Instead we get a segmentation fault. There's a number of issues here:
To begin with, we put a stub Fixnum#to_s
with an error, and a String#to_s
which simply returns
self
in place, as well as #puts
and #print
versions in Object
(rather than Kernel
, as
noted above):
See 6af2c31:
def puts str
- %s(puts (callm str __get_raw))
+ raw = str.to_s.__get_raw
+ %s(if raw
+ (puts raw)
+ (puts "")
+ )
+ end
+
+ def print str
+ raw = str.to_s.__get_raw
+ %s(if raw
+ (printf "%s" raw)
+ )
end
These are woefully incomplete, but at least in theory they ought to not faily completely....
However, let us take a look what happens if we run the compiler with --parsetree
:
$ ruby compiler.rb --parsetree --norequire features/inputs/06print.rb
reading from file: features/inputs/06print.rb
%s(do
puts
(call puts (
(sexp (call __get_string .L0))))
(call print (
(sexp (call __get_string .L1))))
(call puts ())
(call print (
(sexp (call __get_string .L2))))
(call print (
(sexp (call __get_string .L3)) (sexp (call __get_string .L4)) (sexp (call __get_fixnum 1))
(sexp (call __get_string .L5))))
(call puts (
(sexp (call __get_string .L3)) (sexp (call __get_string .L4)) (sexp (call __get_string .L6))
(sexp (call __get_fixnum 2))))
)
Some likely problems to begin with:
puts
will likely break horribly as the methods specify a single argument, with
no default value, and we don't trap that in any way.puts 42
line has resulted in an empty argument list (!)Now, while the methods we added are obviously broken, some stuff ought to get executed anyway
(subject to buffering, some of it ought to make it to output too), so lets deal with puts 42
first, then add splats to the methods, try to get just puts
to work, and then see if we get
through any of the above at that point.
Passing --notransform
shows us the likely culprit is the rewrites in transform.rb
"
$ ruby compiler.rb --parsetree --norequire --notransform features/inputs/06print.rb
reading from file: features/inputs/06print.rb
%s(do
puts
(call puts Should be one line down due to linefeed above)
(call print Hello )
(call puts 42)
(call print World\n)
(call print (This is 1 test\n))
(call puts (This is test 2))
)
We'll use that to add tests targetting specific the transforms as a whole, and then specifically
for rewrite_fixnumconst
since that seems like a reasonable place to start.
But first, since running this code from gdb and doing a backtrace consistently shows crashes
related to our __send__
fallback or method_missing
- neither which should get triggered
at this stage, lets change __send__
to make it more resilient to implementation
bugs for now (in 1e23ae0) and give better debug output:
--- a/lib/core/object.rb
+++ b/lib/core/object.rb
@@ -19,7 +19,10 @@ class Object
end
def __send__ sym, *args
- %s(printf "WARNING: __send__ bypassing vtable not yet implemented. Called with %s\n" (callm sym to_s))
+ %s(printf "WARNING: __send__ bypassing vtable not yet implemented.\n")
+ %s(printf "WARNING: Called with %p\n" sym)
+ %s(printf "WARNING: self = %p\n" self)
+ %s(if sym (printf "WARNING: (string: '%s'\n" (callm sym to_s)))
end
Next I've added a basic test case for 'puts 42' and the expected output only to compiler.feature
in ae61dfd
In 5f792c4 I've added a simple test case for the transformations:
| "puts 42" | [:do, [:call, :puts, [[:sexp,[:call, :__get_fixnum, 42]]]]]
Running it gives us this:
expected: [:do, [:call, :puts, [[:sexp, [:call, :__get_fixnum, 42]]]]]
got: [:do, [:call, :puts, [nil]]] (using ==) (RSpec::Expectations::ExpectationNotMetError)
./step_definitions/shunting_steps.rb:32:in `/^the parse tree should become (.*)$/'
transform.feature:10:in `Then the parse tree should become <tree>'
Looking carefully at rewrite_fixnumconst
and comparing to rewrite_strconst
reveals the
source: "FIXME" warning has been subtly broken to use the wrong index.
The correct line reads (in bfecf5d):
e[i] = E[e[i]] if is_call && i > 1
Fixing this gets us the following output instead for the compiler.feature
puts 42
test:
expected: "42\n"
got: "Fixnum#to_s is not implemented\n" (using ==)
... which is reasonable, given that we haven't implemented it yet.
Handling this in pure Ruby should be possible with what we have available, though probably not
very efficiently. And again in pure laziness, let's defer to the C libs snprintf
. And leak
some memory in the process, but what's a little memory leak amongst friends who aren't freeing
anything at all until exit at the moment? ... (yeah, another fun subject we'll have to bite
the bullet on)
This takes care of that (in a2f36be):
def to_s
- "Fixnum#to_s is not implemented"
+ %s(let (buf)
+ (assign buf (malloc 16))
+ (snprintf buf 16 "%ld" @value)
+ (__get_string buf)
+ )
end
Our puts 42
test now works.
We now want to make puts
(with no arguments) and various other variations work. I'll go through
tracking down these regressions in some detail debugging code generation is frankly often one
of the most annoying and time consuming parts of writing a compiler. And just maybe it'll teach
me to write more tests...
So lets start by adding one:
| inputs/01ctrivial.rb | outputs/01ctrivial.txt | A puts with no argument |
Where the input looks like this:
puts
and the output like this:
... and it fails (I've run it directly rather than bothering to let Cucumber run it yet):
$ gdb /tmp/01ctrivial
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) run
Starting program: /tmp/01ctrivial
Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()
#1 0x0804c128 in __method_Object_puts () at /home/vidarh/src/compiler-experiments/writing-a-compiler-in-ruby/lib/core/object.rb:36
#2 0x0804a622 in main () at 01ctrivial.rb:1
(gdb) quit
That's def puts str
. First problem is that we're currently expecting an argument,
but we're not getting any. In reality the compiler should throw an ArgumentError
when
the method is declared like this and called with too few arguments. But we've not gotten
that far. Anyway, first step is to change this to def puts *str
. But we still need to
handle the difference in argument numbers and it currently won't be very elegant, as we
don't convert str
to an actual Ruby array or anything, so we have to dip down to %s(numargs)
etc.
We first try to add this to the start of Object#puts
:
%s(assign na (call __get_fixnum (numargs)))
if na == 2
%s(puts "")
return
end
We expect na
to be 2 because of self
+ the optional __closure__
argument we pass
(but that is always present).
This fixes puts
on its own, but breaks puts 42
. The reason is the splat operator -
we can't any longer refer directly to str
and expect to get an object - str
now
refers to an address to an array of objects.
So we change the entire puts
to this:
def puts *str
%s(assign na (__get_fixnum numargs))
if na == 2
%s(puts "")
return
end
%s(assign raw (index str 0))
raw = raw.to_s.__get_raw
%s(if raw
(puts raw)
(puts "")
)
end
Apart from being horribly ugly, this also doesn't handle multiple arguments, but it's a step further.
You find this and test in 6918bf8
So, lets try something more convoluted, like, say, puts "foo", "bar"
.
Uh-oh. Doesn't work. What more, if we add a debug printout like %s(printf "numargs=%d\n" numargs)
we get a nasty surprise: It returns 3. For me at least.
Some probing in the assembler output shows us this:
# callm :self.:puts
subl $20, %esp
movl $4, %ebx
movl self, %eax
movl %eax, (%esp)
movl $0, 4(%esp)
subl $4, %esp
movl $1, %ebx
movl $.L11, %eax
movl %eax, (%esp)
movl $__get_string, %eax
call *%eax
addl $4, %esp
movl %eax, 8(%esp)
subl $4, %esp
movl $1, %ebx
movl $.L12, %eax
movl %eax, (%esp)
movl $__get_string, %eax
call *%eax
addl $4, %esp
movl %eax, 12(%esp)
movl (%esp), %edx
movl (%edx), %edx
movl 36(%edx), %eax
call *%eax
addl $20, %esp
# callm self.puts END
Spot the problem? We use %ebx
to hold the number of arguments we calculate, but with our
new "proper" support for strings, we now call __get_string
on the string argument, and
each one of them causes %ebx
to get clobbered. As, incidentally does __get_string
itself, as we don't actually store this value anywhere. We have two choices: Load it
"closer" (That's generally a good idea anyway) or calculate it closer (for splat arguments).
This again underlines that we need proper register allocation including ability to "spill"
registers to the stack. This is complicated in that if we push and pop values all over the
place, we need to be able to adjust all accesses to %esp
all over.
For now we "fake" spilling of %ebx
by pushing %ebx
indiscriminately onto the
stack, and adjusting the generation of the arguments accordingly "manualy" so that when
we pop it off the stack again everything happens to work. We end up with changing the
central parts of Compiler#compile_callm_args
as follows (in 6901567)
@e.with_stack(args.length+1, true) do
if splat
@e.addl(:ecx,:ebx)
end
# we're for now going to assume that %ebx is likely
# to get clobbered later, in the case of a splat,
# so we store it here until it's time to call the method.
@e.pushl(:ebx)
ret = compile_eval_arg(scope, ob)
@e.save_to_stack(ret, 1)
args.each_with_index do |a,i|
param = compile_eval_arg(scope, a)
@e.save_to_stack(param, i+2)
end
# This is where the actual call gets done
# This differs depending on whether it's a normal
# method call or a closure call.
# But first we pull the number of arguments off the stack.
@e.popl(:ebx)
yield
end
The main change here is the @e.pushl(:ebx)
and @e.popl(:ebx)
lines. But
notice that we change the offsets for self
and method arguments accordingly.
This finally allows us to handle #puts
and #print
reasonably close to can see the result of this in 4511cea:
def puts *str
%s(assign na (__get_fixnum numargs))
if na == 2
%s(puts "")
return
end
na = na - 2
i = 0
while i < na
%s(assign raw (index str (callm i __get_raw)))
raw = raw.to_s.__get_raw
%s(if raw (puts raw))
i = i + 1
end
end
def print *str
%s(assign na (__get_fixnum numargs))
if na == 2
%s(printf "nil")
return
end
na = na - 2
i = 0
while i < na
%s(assign raw (index str (callm i __get_raw)))
raw = raw.to_s.__get_raw
%s(if raw (printf "%s" raw))
i = i + 1
end
end
It's not pretty, and there's certainly a lot of room for improvement. In particular "proper" access to the splat arguments would help tremendously, so that's on the agendy for soon too, I think, after code generation testing and some cleaner register allocation.
While we're at it, we might as well take a look at string interpolation and see what kind of trouble we might run into.
First we add a trivial test case (in c7b4b540bb):
a = "Hello World"
b = 42
puts "#{a}, #{b}"
Unsurprisingly it doesn't work, so lets try compiling it with --norequire --parsetree
:
%s(do
(assign a (sexp (call __get_string .L0)))
(assign b (sexp (call __get_fixnum 42)))
(call puts (
(concat (sexp (call __get_string .L1)) a (sexp (call __get_string .L2)) b)))
)
This is clearly outdated - we'd expect concat
to be a method call on the string.
Let us turn of transformations, to see what the parser actually created:
$ ruby compiler.rb --parsetree --norequire --notransform features/inputs/interpolate.rb
reading from file: features/inputs/interpolate.rb
%s(do
(assign a "Hello World")
(assign b "42")
(call puts (
(concat "" a " , " b)))
)
A-ha. concat
nodes were how we implemented interpolation in the parser. So what we
need is to transform them to generate the equivalent of "".concat(a.to_s).concat(" , ").concat(b.to_s)
Here is how we do that (in 35d93b0):
def create_concat(sub)
right = sub.pop
right = E[:callm,right,:to_s] if !right.is_a?(Array)
return right if sub.size == 0
E[:callm, create_concat(sub), :concat, [right]]
end
def rewrite_concat(exp)
exp.depth_first do |e|
if e[0] == :concat
e.replace(create_concat(e[1..-1]))
end
:next
end
end
We replace the existing (concat a b c ...)
with a series of method calls to create and convert
nodes to strings, and then call String#concat
to create the result.
We for now implement String#concat
like this:
+ def concat(other)
+ %s(do
+ (assign ro (callm other __get_raw))
+ (assign osize (strlen ro))
+ (assign bsize (strlen @buffer))
+ (assign size (add bsize osize))
+ (assign size (add size 1))
+ (assign newb (malloc size))
+ (strcpy newb @buffer)
+ (strcat newb ro)
+ (assign @buffer newb)
+ )
+ self
+ end
While this is simple - it just allocates buffers and copies data into them - it isn't very elegant code. So I might as well admit: Some of the simplicity is because of bugs that rears its head if we make the expressions too advanced - likely due to memory protection issues.
But we will take care of that in our part on testing code generation, next time. For now, this works.