This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.From now on I'm not going to cover every little change - too much of it is repetitive, but I'll keep posting parts of this series to cover major changes. If you want to keep track of the changes "blow by blow", follow the project on GitHub. If you find something I've skipped over that you'd like to know more about, I'd be happy to expand on it in the comments, or perhaps cover it in the next part.
We'll keep the s-expression grammar separate from the grammar for the rest of the language, so in the BNF grammar below all productions refers to other parts of this grammar other than "sexp" which refer to the start production of the s-expression grammar. This grammar covers the first set of extension for this part:
program ::= exp* ws* exp ::= ws* (def | sexp) def ::= "def" ws* name args? ws* defexp* "end" defexp ::= sexp
args ::= ws* sexp name ::= atom
This simple initial grammar will let us change some of the examples. Here's a snippet from testargs.l:
def g %s(i j) %s(let (k) (assign k 42) (printf "numargs=%ld, i=%ld,j=%ld,k=%ld\n" numargs i j k) ) end
Not exactly a tremendous difference, but it's a start, and it puts the basics in place to chip away at the parser bit by bit, so lets look at some code. It's fairly straightforward, so I'll just go through the "def" production - you can see the rest of this version here.
# def ::= "def" ws* name args? ws* defexp* "end" def parse_def return nil if [email protected]("def") @s.ws raise "Expected function name" if !(name = parse_name) args = parse_args @s.ws exps = [:do] while e = parse_defexp; exps << e; end raise "Expected expression of 'end'" if [email protected]("end") return [:defun, name, args, exps] endAs for the s-expression parser it's a fairly straightforward translation of the BNF into a recursive descent parser, with some minor additional error handling.
program ::= exp* ws* exp ::= ws* (def | sexp) def ::= "def" ws* name args? ws* defexp* "end" defexp ::= sexp args ::= ws* sexp name ::= atom args ::= nolfws* ( "(" ws* arglist ws* ")" | arglist ) arglist ::= ("*" ws*)? name nolfws* ("," ws* arglist)? nolfws ::= [ \t]+ ; This rule is defined in the scanner source code, like ws
The new thing in the updated grammar is redefining "args" and adding "arglist". Those rules effectively says:
A set of args consist of optional whitespace excluding line feed (and carriage return) followed by either an arglist, or an arglist in parentheses. An arglist consists of of an optional "*" ("splat",equivalent to our :rest flag) followed by a name, whitespace excluding lf, and optionally ",", whitespace and arglist. In other words we use recursion allow a list of arguments (and just while writing this I realized that grammar is stupid, as it allows "*" in front of all arguments, which makes no sense - to be fixed...)
So the example from above can be changed to this now:
def g(i,j) %s(let (k) (assign k 42) (printf "numargs=%ld, i=%ld,j=%ld,k=%ld\n" numargs i j k) ) end
One step closer to a nice syntax...
One more thing is new: Pass the "--parsetree" option to the compiler to dump the parsetree (using PP) together with some crappy error reporting:
[vidarh@dev writing-a-compiler-in-ruby]$ cat testargs.l | ruby compiler.rb --parsetree [:do, [:defun, :f, [:test, [:arr, :rest]], [:do, [:let, [:i], [:assign, :i, 0], [:while, [:lt, :i, [:sub, :numargs, 1]], [:do, [:printf, "test=%ld, i=%ld, numargs=%ld, arr[i]=%ld\n", :test, :i, :numargs, [:index, :arr, :i]], [:assign, :i, [:add, :i, 1]]]]]]], [:defun, :g, [:i, :j], [:do, [:let, [:k], [:assign, :k, 42], [:printf, "numargs=%ld, i=%ld,j=%ld,k=%ld\n", :numargs, :i, :j, :k]]]], [:f, 123, 42, 43, 45], [:g, 23, 67]] [vidarh@dev writing-a-compiler-in-ruby]$