I've read a lot of research papers over the years, but there I keep coming back to like
Professor Dr. Michael Franz' doctoral dissertation:
Code-Generation On-the-Fly:
A Key to Portable Software (PDF).
I've been harping about Semantic Dictionary Encoding (SDE) ever since I first read the paper back in 1994, and got quite close to actually implementing at one point. So let me give a brief overview, since I can't drag it out often enough (someone take it mainstream, damn it!):
SDE is, at it's most basic, a compression mechanism for the intermediate representation of a compiler. Similiarly to
Huffman coding, an SDE encoder and decoder pair both builds a dictionary of the datastream while building it and decoding it respectively. The dictionary entries is the used to encode/decode the following part of the data stream, which cause more entries to be added to the dictionary, etc., meaning that the later you are in the data stream, the more of the stream will refer to previously added dictionary entries instead of having to be all verbose and include the full definition.
But to call it just a compression method would be selling it very short. The appeal of SDE is that it can be used as a machine independent representation of a program, while at the same time it can (if used properly) retain far more semantic information about the program than a typical bytecode. Contrary to most bytecode, SDE retains the structure of the program as represented by the compiler.
This has a few very appealing properties beyond the nice parts of JIT (such as being able to target specific CPU versions):
- When (re)building the dictionary on decoding, you can store auxiliary information and even partway generated code, to speed up generation of subsequent pieces of code that use that dictionary element. The compression removes the need for pattern recognition - the pattern recognition was already done at encoding time. Franz' specifically tested outright code copying to speed up the code generation, where small pieces of code that was generated was tracked if it was invariant, in which case future uses of the same dictionary entry would lead to just copying the previously generated piece.
- You retain the ability to do JIT, but have high level flow information available, making it trivial to do things like sub-expression elimination and a number of similar "cheap" optimizations at JIT time based on knowledge available at runtime. The more you can defer generation and the more additional information becomes available before generation, the better you can do.
- The code generator can be modified to specifically recognize higher level dictionary elements that can be specifically optimized, while dictionary definitions for those elements can be provided that represents them in terms of lower level patterns. What that buy you is a backwards compatible way of improving performance that can provide significantly improved performance when running on systems or platforms for which the newest decoder is available, while still retaining full compatibility with the lowest common denominator - lowering the threshold for porting and for innovation.
- SDE retains the block structure and type information of the program, making it well suitable for highly dynamic languages (such as Ruby) that tend to either result in complex, bloated and slow bytecode or highly specialized, language specific VM's. With SDE, a highly specialized decoder can still exist, but by providing dictionary entries that act as templates for the specialized constructs of the language as part of the program, a generic decoder can still execute the program. The same benefits apply for more static languages, but it would appear to be more pronounced for dynamic languages where the final generated code can be far more effectively optimized once more information is known (i.e. in Ruby you can potentially do a far better job if you can JIT and cache specialized versions of code that take into account the actual layout of a class once all code that may reopen and extend/modify the class has been loaded - this doesn't really apply to languages where the structure of a class is known definitively at some point before the program is run).
- Because of the level of semantic information retained in the file format, SDE is not at odds with generating native binaries. Where a typical VM such as JVM's may or may not suffer a semantic disconnect with the destination architecture which means that a native binary generated from their bytecode might end up varying significantly from what a code generator would have generated with full knowledge of both the high level structure of the program and the target architecture, SDE doesn't have the same issue since very little of the semantics of the original program is lost in translation.This means that a compiler can reasonably be split into a portable part outputting SDE binaries, and either a runtime SDE decoder/linker and/or a linker producing native code without loss of information. Such a native code generator can be run either prior to delivery of the code, or it can be run on installation, to be able to optimize for a specific CPU model, and can use more expensive optimizations than might be acceptable for runtime code generation.
- Instructions for tracing, additional safety (bound checking etc. in languages that don't require it) can be generated at code-generation time automatically. Thus you can support constructs that would slow down the final binary and only turn them on if needed. This can go as far as leaving in debug assertions and logging constructs that can be turned on at the command line, that need never even make it into the generated code if the appropriate switch isn't present, thus allowing extra capabilities to be included without burdening the system in normal use. For programs where large parts of functionality is switched on or off from the command line, large parts of the program may just get discarded as a result of simple sub-expression elimination in the code generator, something which is particularly beneficial where the various parts cross-cut the same algorithm and substantial simplifications can be made.
Note that the some of the above is beyond the scope of the original dissertation, though all of it is hinted at or implied by the architecture.
One thing that may or may not be obvious from the above is that SDE, beyond efficiently encoding a tree version of the intermediate representation actually also turns the tree into a directed graph. Nodes further down in the tree get reused, linked the tree together at places where common sub-expressions are used.
Compared to a typical VM, SDE has one potential drawback in terms of delayed code generation, and that is that the dictionary can potentially grow quite large. This is ameliorated by scoping - by marking the start of a scope in the dictionary, and then throwing away any data generated specific to the local scope when reaching the end of the local scope the dictionary can be kept fairly compact.
Performance concerns?
Franz' addressed performance reasonably thoroughly, showing that SDE in most cases - on a mid 90's system based around a M68040 CPU running at 40MHz - performed very reasonable compared to loading native binaries. Franz' mused that CPU speed at the time was increasing faster than disk speeds, and that there is/was a very real possibility that this difference might be reversed so that SDE might eventually get faster. I don't know if that held up - it'd be interesting to redo those tests.
As for runtime performance, the paper contains a few benchmark tests against the then current MPW C implementation for Mac, and the tested Oberon code using SDE beat the corresponding C code hands down.
What's happened since?
Not much, unfortunately. Franz has continued work on various code generation methods, but Java came along around the same time and more or less monopolized interest the portability aspect. At least Mac Oberon had support for SDE, but as for much other great stuff coming out of the
ETH computer science departments it was hampered by it's ties to Oberon, which fell out of favor together with Modula and Pascal type languages.
Franz' publications page is well worth a look - he's continuing to churn out great stuff.
My interest in SDE
I first got interested in SDE as a potential feature to contribute to a longwinded debate on comp.sys.amiga.misc back in '94 about writing an open source replacement for
AmigaOS. It fell by the wayside (but one of the original people, Aaron Digulla has stuck with it, and
AROS is probably the closest we today have to a real open source descendant of AmigaOS).
At various times I've wanted to resurrect my original attempts, but never have had the time.
This time? We'll see - I'm
loosely thinking about looking at adding SDE support as one of the forthcoming steps for my
series about writing a compiler in Ruby. I've learned a lot about compiler technology in the meantime, and I got quite far the first time - SDE isn't hard to do, anyway - so I'm hoping that I'll find the time to do a working implementation at some point in the not to far future.