I wrote about
syntax hightlighting in Ruby earlier. The Ruby Syntax library supports Ruby, YAML and XML out of the box. But it's also pretty easy to extend to handle other languages.
Since I've been writing my
compiler in Ruby series and including a lot of x86 assembler, I figured I'd see how much (or little) work adding a syntax highlighter for assembler would take.
It's by no means perfect - I've only spent half an hour or so throwing this together, but it's reasonable, and easy enough to keep adjusting.
Here it is:
require 'rubygems'
require 'syntax'
class AsmTokenizer < Syntax::Tokenizer
def setup
@state = :newline
end
def step
@state = :newline if bol?
if @state == :newline
# Handle labels and operands
if label = scan(/[a-zA-Z.][a-zA-z0-9_]*:/) then start_group :label, label
elsif words = scan(/\.[a-zA-Z0-9_]*/)
start_group :directive, words
@state = :operands
elsif words = scan(/[a-zA-Z]+/)
start_group :operator, words
@state = :operands
else start_group(:normal, getch)
end
else
# Handle operators and assorted punctuation
if words = scan(/,/) then start_group :comma, words
elsif words = scan(/[\-0-9][0-9]*/) then start_group :number, words
elsif words = scan(/%[a-zA-Z]+/) then start_group :register, words
elsif words = scan(/[\.a-zA-Z][a-zA-Z0-9]*/) then start_group :label, words
elsif words = scan(/\$/) then start_group :value, words
elsif words = scan(/[\(\)]/) then start_group :paren, words
elsif words = scan(/\".*\"/) then start_group :quoted, words
else start_group(:normal, getch)
end
end
end
end
# Register the custom highlighter
Syntax::SYNTAX['asm'] = AsmTokenizer
I don't have time do write a lot of explanation -
the Syntax manual does a reasonable job of describing it. The one pitfall to be aware of, is that you
must make sure that your #step method advances at least one character no matter what, or you'll get stuck in an infinite loop.
Here's an example of how to use the new highlighter:
require 'syntax/convertors/html'
convertor = Syntax::Convertors::HTML.for_syntax "asm"
puts convertor.convert( File.read("/tmp/step5.s"))
And here's some CSS to color the output:
body { background: black; }
.directive { color: purple; }
.comma { color: white; }
.paren { color: white; }
.value { color: white; }
.number { color: yellow; }
.label { color: blue; }
.register { color: brown; }
.operator { color: lightgrey; }
.quoted { color: green; }
And some example output from the compiler series:
.text
.globl main
.type main, @function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4,%esp
movl $.LC0,(%esp)
call puts
addl $4, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.section .rodata
.LC0:
.string "Hello World"