Writing a compiler in Ruby bottom up - step 3 2008-04-06

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

I hoped to post this earlier, but I've just been swamped this week - though cleaning up my old material so far have only taken about half an hour per post. Anyway, here is part three, and at the end I've posted a list of roughly what will be covered in the following parts. Roughly because I've tried grouping a few of them together with the intent of posting longer, more meaningful parts (this part is a good example why - it's very short), so I've cut my original thirty or so parts down to 20 (though that's just what I finished writing - that by no means finishes the job).

Chaining expressions with "do"

So far, the second version from last time can execute one single expression, and that's it. Not very useful. So I need to add a way to chain expressions like in the body of a function. As it stands, that's terribly simple. I'll add a keyword - "do" - that will accept an unlimited (meaning, in practice, limited by memory) list of expressions and evaluate them. It'll look like this:

    prog = [:do, 
       [:printf," "], 

It's actually trivially simple. Right at the beginning of #compile_exp we add this:

        if exp[0] == :do                                                                                                                                                        
          exp[1..-1].each { |e| compile_exp(e) }                                                                                                                                

Recursion will play a great role here - you're descending a tree structure after all, so the core method to compile an expression will be called on deeper and deeper nodes as well, especially to handle proper sub-expressions, which is our next goal.

Adding sub-expressions, take 1

Here's the test we'd like to make work:

    prog = [:printf,"'hello world' takes %ld bytes\\n",[:strlen, "hello world"]], 

Here's the first change. In #get_arg, we add this at the start:

        # Handle strings or subexpressions                                                                                                                                      
        if a.is_a?(Array)
          return nil # What should we return?                                                                                                                                   

If you try to make the code above compile the test, you'll get an error from gcc, because we expect get_arg to return a sequence number for the string constants and nothing else, but that clearly doesn't work for sub expressions.

Adding sub-expressions, take 2: Return values

So how does GCC handle this. Getting the assembly for this:

int main()
  printf("'Hello world' takes %ld bytes\n",foo("Hello world"));

... results in this (just the relevant part of main):

        subl    $20, %esp
        movl    $.LC0, (%esp)
        call    foo
        movl    %eax, 4(%esp)
        movl    $.LC1, (%esp)
        call    printf
        addl    $20, %esp

... which shows that it's pretty straightforward. Gcc first calls the sub expression (foo), and then expects the return value of a function to be in the register "%eax", and so that needs to be copied onto the stack instead of the address of a string constant.

First up let's fix #get_arg:

      def get_arg(a)                                                                                                                                                            
        # Handle strings or subexpressions                                                                                                                                      
        if a.is_a?(Array)                                                                                                                                                       
          return [:subexpr]                                                                                                                                                     
        seq = @string_constants[a]                                                                                                                                              
        return seq if seq                                                                                                                                                       
        seq = @seq                                                                                                                                                              
        @seq += 1                                                                                                                                                               
        @string_constants[a] = seq                                                                                                                                              
        return [:strconst,seq]                                                                                                                                                  

The only changes are the "return" expressions, where we indicate what is returned - this will expand considerably later.

The remaining change is pretty much a rewrite of the rest of compile_exp. Instead of just collecting the results of get_arg, we iterate over it and output directly (which is why the stack adjustment's also change, since the "args" array has gone away):

        stack_adjustment = PTR_SIZE + (((exp.length-1+0.5)*PTR_SIZE/(4.0*PTR_SIZE)).round) * (4*PTR_SIZE)
        puts "\tsubl\t$#{stack_adjustment}, %esp" if exp[0] != :do
        exp[1..-1].each_with_index do |a,i| 
          atype, aparam = get_arg(a)
          if exp[0] != :do
            if atype == :strconst
              param = "$.LC#{aparam}"
              param = "%eax"
            puts "\tmovl\t#{param},#{i>0 ? i*4 : ""}(%esp)"

As you can see the change isn't that complex. We just check the return value from #get_arg and pick a string constant or %eax depending. This part will expand considerably as we add more different types of things that can be returned.

You can find the latest version here

Upcoming parts

These are the almost ready pre-written parts only. I expect once I need to start catching up on new parts the focus will be on a simple parser with the goal of making the compiler self-hosted (i.e. able to compile itself) as quickly as possible.

blog comments powered by Disqus