This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.
One of the big elephants sauntering around the room for a long time has been
the issue of how to handle the specifics of how Ruby handles nil
, true
and
false
. To a lesser extent this issue also affects numbers, but it is those
three values that are most critical right now.
The reason is control flow. So far, we've treated these values the way C does:
nil
is simply the null pointer; true
is any non-zero value, and false is zero
(and thus for most practical intents the same as nil
.
The problem, of course, is that this is not the way it is in Ruby. true
, false
and nil
are values distinct from the numbers, and they compare with each others
and with other values in different ways than in C.
They are also objects. Which means we lose out on some of the simplest ways of
doing comparisons and turning the comparison results into a value. We may find
people doing things like if <some expression>.nil?
.
nil
and false
both evaluate to false in a conditional, but nil != false
.
So far "faking it" have worked, because with a few exceptions like the ones above, the C and Ruby variations are relatively compatible. But it's not a lasting solution.
There is another problem: If we change basic contructs like the s-expression if
to
work on Ruby objects, we'll find it hard to implement the "plumbing" under Ruby.
Don't bring out the pickaxe just yet (groan). As it happens, our compiler compiles two very different languages: The s-expression inspired low level language used both as the compilation target for Ruby and implementation language for low level features, and Ruby itself.
The former language is de facto typeless, like BCPL: We pass values around with wild abandon, and we even clobber Ruby local variables and instance variables with it, but what meaning these values have depends entirely on usage rather than the type of the variable (as in C) or a type attached to the value itself, like in Ruby.
And as it happens, here lies both the problem and solution to our conundrum from above:
If only the compiler can know when it is dealing with real Ruby values, and when
it is dealing with something else, then, e.g. compile_if
can generate different
code in these situations.
Not only that: We will need this information when we eventually get tired of leaking memory and start adding a garbage collector - otherwise we're stuck with a conservative collector, so we get twice the benefit.
It will also help us contain the "leakage" of untyped values into Ruby, by letting us define and narrow the rules for when and where and how we're allowed to work with them.
As it happens, we don't need a very complicated type-system: For now we can get away with knowing if a reasonable subset of constructs returns either an object or may contain anything.
That's it. That's the grand total of the static typing we'll introduce this time.
However the changes start laying the groundwork for more static typing that we can use for optimizations and sanity checks. Ultimately I wish to relegate the "s-expression plumbing" to a very restricted space.
Apart from just categorizing stuff into two types, there's another limitation too
for now: Where we act on type information, we will treat all variables as typed to
objects, and all return variables from method calls to be typed as objects. We
will implicitly assume that the s-expression syntax will be contained, though we
are not yet verifying that. In some cases this will be outright wrong. E.g. this
explicitly prevents if foo; bar; end
from working correctly if foo
is not an
object, and happens to contain 0, and in any number of similar instances, so it
is likely introducing some regressions (I caught one while writing this - there are
probably more).
First, let's put some basic test cases in place. You'll find them in d22b95f
Then lets start putting our new typing into place. Let's start with a Value
class
to hold a possibly typed value (in 3ec81cb):
require 'delegate'
# Used to hold a possiby-typed value
# Currently, valid values for "type"
# are :object or nil.
class Value < SimpleDelegator
attr_reader :type
def initialize ob, type = nil
super(ob)
@type = type
end
# Evil. Since we explicitly check for Symbol some places
def is_a?(ob)
__getobj__.is_a?(ob)
end
end
To simplify refactoring, we have it be a delegator, so we only selectively add/change
behaviour as needed. For now, the only new thing is that #type
will return the associated
type tag, or nil. We only support :object
for now, to indicate we know the value to be a
pointer to a Ruby object.
I'm not going to go through ever detail of the changes in compiler.rb
. You can find the full set
in 0ded9c1
Apart from a number of changes to return objects of the new Value
class, the main things
to notice are as follows:
We add :false
, :true
, and :nil
to @global_constants
to prevent them from being treated
as method calls:
+ @global_constants << :false
+ @global_constants << :true
+ @global_constants << :nil
Next up is this change in compile_if
:
- @e.jmp_on_false(l_else_arm, res)
+
+ if res && res.type == :object
+ @e.save_result(res)
+ @e.cmpl(@e.result_value, "nil")
+ @e.je(l_else_arm)
+ @e.cmpl(@e.result_value, "false")
+ @e.je(l_else_arm)
+ else
+ @e.jmp_on_false(l_else_arm, res)
+ end
+
What's happening here is that instead of assuming an untyped value, we check to see if we know we have an object. If we do, and we come across "if result; ...; else ...; end", we change the code to effectively do the equivalent of:
if result != nil && result != false
# if block
else
# else block
end
There's an equivalent change for compile_while
.
Furthermore there's a few minor additional changes to scope.rb
to prevent true/false/nil from
being treated as method calls in 52f31ad3
We also need to check in transform.rb
that we're not trying to treat true, false and nil as
local variables. See f2af5fc
In order to make these changes work, we also need to modify the runtime in various ways.
Most obviously, we need to actually make true
, false
and nil
real objects. We do that in
lib/core/core.rb
:
+require 'core/true'
+true = TrueClass.new # FIXME: MRI does not allow creating an object of TrueClass
+require 'core/false'
+false = FalseClass.new # FIXME: MRI does not allow creating an object of FalseClass
+require 'core/nil'
+nil = NilClass.new # FIXME: MRI does not allow creating an object of NilClass.
+
# OK, so perhaps this is a bit ugly...
self = Object.new
@@ -59,9 +66,6 @@ STDERR = 1
STDOUT = IO.new
ARGV=7
Enumerable=8 #Here because modules doesn't work yet
-nil = 0 # FIXME: Should be an object of NilClass
-true = 1 # FIXME: Should be an object of TrueClass
-false = 0 # FIXME: Should be an object of FalseClass
These depends on very basic initial implementations of TrueClass
, FalseClass
and
NilClass
- see c356591
Another change is in lib/core/fixnum.rb
, where all the comparison operators needs to change:
def == other
- %s(eq @value (callm other __get_raw))
+ %s(if (eq @value (callm other __get_raw)) true false)
end
This is because %s(eq ..)
etc. does not handle typing yet (and they may not necessarily
ever need it), so we use our newly typed %s(if ..)
coupled with explicitly returning the
right objects instead of the numeric values we'd previously get.
It is important to do this in particular as one of the changes I snuck past in compiler.rb
assumes that method calls returns Ruby objects.
Almost done now, but there's also a minor change to lib/core/object.rb
to remove the horribly
hacky true
and false
methods we used previously.
As it happens, we have a few more things to do: %s(and ..)
and %s(or ...)
needs to take
type information into account to be able to generate proper code for e.g.:
if a and b
...
elsif a or c
...
end
In our new world, a and b
(or a && b
) will always be true, because both true
and false
have integer values that are non-null. Similarly a or c
/ a || c
will always be true as well,
since both values will be seen to evaluate to true.
First of all, I've added a test case to catch this, in 1dfe043. But one of our other test
cases shows a regression as well. features/inputs/strcmp.rb
now gives wrong results, because
we previously relied on being able to use "plain Ruby" if
to check the result of a call to
strcmp
that we stored in a local variable. But for now at least, we're assuming variables
contain objects. We'll likely want to refine that, but for now we'll apply a workaround that
will work (in 32ddcde):
%s(assign res (if (strcmp @buffer (callm other __get_raw)) false true))
By explicitly assigning with the values false
or true
, from the result of a value that
will get an indeterminate type, it will work again.
But lets fix "&&"/"and". Firstly we need to actually store the return value from compile_eval_arg
for if_arm
and else_arm
(in 2ae727d):
- compile_eval_arg(scope, if_arm)
+ ifret = compile_eval_arg(scope, if_arm)
@e.jmp(l_end_if_arm) if else_arm
@e.local(l_else_arm)
- compile_eval_arg(scope, else_arm) if else_arm
+ elseret = compile_eval_arg(scope, else_arm) if else_arm
Secondly, we need to determine type based on them. Most importantly, we can only
safely return a type that is shared by both of them if both if
and else
are
present (also in 2ae727d):
- return Value.new([:subexpr])
+ # We only return a specific type if there's either only an "if"
+ # expression, or both the "if" and "else" expressions have the
+ # same type.
+ #
+ type = nil
+ if ifret && (!elseret || ifret.type == elseret.type)
+ type = ifret.type
+ end
+
+ return Value.new([:subexpr], type)
end
Other than that, we're simply just adding our missing compile_or
:
(EDIT: This implementation is broken; a correct version will be in part 41)
+ def compile_or scope, left, right
+ compile_if(scope, left, false, right)
+ end
And that's it for this time.