Writing a (Ruby) compiler in Ruby bottom up

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

To be or not to be nil

One of the big elephants sauntering around the room for a long time has been the issue of how to handle the specifics of how Ruby handles nil, true and false. To a lesser extent this issue also affects numbers, but it is those three values that are most critical right now.

The reason is control flow. So far, we've treated these values the way C does: nil is simply the null pointer; true is any non-zero value, and false is zero (and thus for most practical intents the same as nil.

The problem, of course, is that this is not the way it is in Ruby. true, false and nil are values distinct from the numbers, and they compare with each others and with other values in different ways than in C.

They are also objects. Which means we lose out on some of the simplest ways of doing comparisons and turning the comparison results into a value. We may find people doing things like if <some expression>.nil?.

nil and false both evaluate to false in a conditional, but nil != false.

So far "faking it" have worked, because with a few exceptions like the ones above, the C and Ruby variations are relatively compatible. But it's not a lasting solution.

There is another problem: If we change basic contructs like the s-expression if to work on Ruby objects, we'll find it hard to implement the "plumbing" under Ruby.

Introducing some static typing

Don't bring out the pickaxe just yet (groan). As it happens, our compiler compiles two very different languages: The s-expression inspired low level language used both as the compilation target for Ruby and implementation language for low level features, and Ruby itself.

The former language is de facto typeless, like BCPL: We pass values around with wild abandon, and we even clobber Ruby local variables and instance variables with it, but what meaning these values have depends entirely on usage rather than the type of the variable (as in C) or a type attached to the value itself, like in Ruby.

And as it happens, here lies both the problem and solution to our conundrum from above:

If only the compiler can know when it is dealing with real Ruby values, and when it is dealing with something else, then, e.g. compile_if can generate different code in these situations.

Not only that: We will need this information when we eventually get tired of leaking memory and start adding a garbage collector - otherwise we're stuck with a conservative collector, so we get twice the benefit.

It will also help us contain the "leakage" of untyped values into Ruby, by letting us define and narrow the rules for when and where and how we're allowed to work with them.

As it happens, we don't need a very complicated type-system: For now we can get away with knowing if a reasonable subset of constructs returns either an object or may contain anything.

That's it. That's the grand total of the static typing we'll introduce this time.

However the changes start laying the groundwork for more static typing that we can use for optimizations and sanity checks. Ultimately I wish to relegate the "s-expression plumbing" to a very restricted space.

Apart from just categorizing stuff into two types, there's another limitation too for now: Where we act on type information, we will treat all variables as typed to objects, and all return variables from method calls to be typed as objects. We will implicitly assume that the s-expression syntax will be contained, though we are not yet verifying that. In some cases this will be outright wrong. E.g. this explicitly prevents if foo; bar; end from working correctly if foo is not an object, and happens to contain 0, and in any number of similar instances, so it is likely introducing some regressions (I caught one while writing this - there are probably more).

Introducing Value

First, let's put some basic test cases in place. You'll find them in d22b95f

Then lets start putting our new typing into place. Let's start with a Value class to hold a possibly typed value (in 3ec81cb):


    require 'delegate'
    
    # Used to hold a possiby-typed value
    # Currently, valid values for "type"
    # are :object or nil.
    class Value < SimpleDelegator
      attr_reader :type
    
      def initialize ob, type = nil
        super(ob)
        @type = type
      end
    
      # Evil. Since we explicitly check for Symbol some places
      def is_a?(ob)
        __getobj__.is_a?(ob)
      end
    end

To simplify refactoring, we have it be a delegator, so we only selectively add/change behaviour as needed. For now, the only new thing is that #type will return the associated type tag, or nil. We only support :object for now, to indicate we know the value to be a pointer to a Ruby object.

I'm not going to go through ever detail of the changes in compiler.rb. You can find the full set in 0ded9c1

Apart from a number of changes to return objects of the new Value class, the main things to notice are as follows:

We add :false, :true, and :nil to @global_constants to prevent them from being treated as method calls:


    +    @global_constants << :false
    +    @global_constants << :true
    +    @global_constants << :nil

Next up is this change in compile_if:


    -    @e.jmp_on_false(l_else_arm, res)
    +
    +    if res && res.type == :object
    +      @e.save_result(res)
    +      @e.cmpl(@e.result_value, "nil")
    +      @e.je(l_else_arm)
    +      @e.cmpl(@e.result_value, "false")
    +      @e.je(l_else_arm)
    +    else
    +      @e.jmp_on_false(l_else_arm, res)
    +    end
    +

What's happening here is that instead of assuming an untyped value, we check to see if we know we have an object. If we do, and we come across "if result; ...; else ...; end", we change the code to effectively do the equivalent of:


       if result != nil && result != false
          # if block
       else
          # else block
       end

There's an equivalent change for compile_while.

Furthermore there's a few minor additional changes to scope.rb to prevent true/false/nil from being treated as method calls in 52f31ad3

We also need to check in transform.rb that we're not trying to treat true, false and nil as local variables. See f2af5fc

Changes to the runtime

In order to make these changes work, we also need to modify the runtime in various ways. Most obviously, we need to actually make true, false and nil real objects. We do that in lib/core/core.rb:


    +require 'core/true'
    +true = TrueClass.new # FIXME: MRI does not allow creating an object of TrueClass
    +require 'core/false'
    +false = FalseClass.new # FIXME: MRI does not allow creating an object of FalseClass
    +require 'core/nil'
    +nil  = NilClass.new # FIXME: MRI does not allow creating an object of NilClass.
    +
     # OK, so perhaps this is a bit ugly...
     self = Object.new
     
    @@ -59,9 +66,6 @@ STDERR = 1
     STDOUT = IO.new
     ARGV=7
     Enumerable=8 #Here because modules doesn't work yet
    -nil = 0      # FIXME: Should be an object of NilClass
    -true = 1     # FIXME: Should be an object of TrueClass
    -false = 0    # FIXME: Should be an object of FalseClass

These depends on very basic initial implementations of TrueClass, FalseClass and NilClass - see c356591

Another change is in lib/core/fixnum.rb, where all the comparison operators needs to change:


       def == other
    -    %s(eq @value (callm other __get_raw))
    +    %s(if (eq @value (callm other __get_raw)) true false)
       end

This is because %s(eq ..) etc. does not handle typing yet (and they may not necessarily ever need it), so we use our newly typed %s(if ..) coupled with explicitly returning the right objects instead of the numeric values we'd previously get.

It is important to do this in particular as one of the changes I snuck past in compiler.rb assumes that method calls returns Ruby objects.

Almost done now, but there's also a minor change to lib/core/object.rb to remove the horribly hacky true and false methods we used previously.

To be or not to be and more?

As it happens, we have a few more things to do: %s(and ..) and %s(or ...) needs to take type information into account to be able to generate proper code for e.g.:


        if a and b
            ...
        elsif a or c
            ...
        end

In our new world, a and b (or a && b) will always be true, because both true and false have integer values that are non-null. Similarly a or c / a || c will always be true as well, since both values will be seen to evaluate to true.

First of all, I've added a test case to catch this, in 1dfe043. But one of our other test cases shows a regression as well. features/inputs/strcmp.rb now gives wrong results, because we previously relied on being able to use "plain Ruby" if to check the result of a call to strcmp that we stored in a local variable. But for now at least, we're assuming variables contain objects. We'll likely want to refine that, but for now we'll apply a workaround that will work (in 32ddcde):


        %s(assign res (if (strcmp @buffer (callm other __get_raw)) false true))

By explicitly assigning with the values false or true, from the result of a value that will get an indeterminate type, it will work again.

But lets fix "&&"/"and". Firstly we need to actually store the return value from compile_eval_arg for if_arm and else_arm (in 2ae727d):


    -    compile_eval_arg(scope, if_arm)
    +    ifret = compile_eval_arg(scope, if_arm)
         @e.jmp(l_end_if_arm) if else_arm
         @e.local(l_else_arm)
    -    compile_eval_arg(scope, else_arm) if else_arm
    +    elseret = compile_eval_arg(scope, else_arm) if else_arm

Secondly, we need to determine type based on them. Most importantly, we can only safely return a type that is shared by both of them if both if and else are present (also in 2ae727d):


    -    return Value.new([:subexpr])
    +    # We only return a specific type if there's either only an "if"
    +    # expression, or both the "if" and "else" expressions have the
    +    # same type.
    +    #
    +    type = nil
    +    if ifret && (!elseret || ifret.type == elseret.type)
    +      type = ifret.type
    +    end
    +
    +    return Value.new([:subexpr], type)
       end

Other than that, we're simply just adding our missing compile_or:

(EDIT: This implementation is broken; a correct version will be in part 41)


    +  def compile_or scope, left, right
    +    compile_if(scope, left, false, right)
    +  end

And that's it for this time.

Writing a (Ruby) compiler in Ruby bottom up - step 39 2014-09-30

To be or not to be nil

Introducing some static typing

Introducing Value

Changes to the runtime

To be or not to be and more?