XML Parser with my parser assembler

XML Parser with my parser assembler 2005-04-23

One of the best ways of testing a new translator/compiler/interpreter is to feed it complex input. So I set about creating an XML parser.

It took me less than 4 hours to get a more-or-less working SAX parser for the complete XML grammar, and more than half of that was troubleshooting the parser assembler (I found a few problems, and a few places I need to make enhancements).

Now, this is what prompted me to look into a push parser architecture in the first place.

The XML grammar was essentially just the grammar from the W3 XML specification with some minor changes to match my BNF variation and add triggers for semantic actions.

It's not 100% complete yet - for instance I don't restrict all the character classes correctly - but it's fairly close. Also, I don't handle any encodings properly yet, but the parser architecture makes that fairly easy to do: I need to add support for byter order markers, and then place a trigger for the encoding attribute and let the trigger handler plug in an object to filter the plugin for the right encoding (providing it's one I want to support).

The entire parser assembler is prepared to handle full unicode characters anyway - it's all 32 bit characters internally.

I love my new toy :)