Type composition in static languages 2005-03-15


Over the years, one of the things that have kept fascinating me is the subject of runtime type composition. In other words: Combining a complete type from separate implementations of interfaces.

Or more specifically, how to do this in an efficient way in a statically typed language, such as C++. Languages such as Smalltalk and ECMAscript / Javascript supports this easily, but apart from syntactic ease of composition, their approaches does not do much better than what you can reasonably easily implement in C++.

One article always worth mentioning is Protocol Extension: A Technique for Structuring Large Extensible Software-Systems by Dr. Michael Franz, who outlines an experimental approach he deviced for Oberon.

Warning: Long rant following... Lots of implementation details glossed over :-)

Quick and dirty type composition in C++

A basic approach to type composition, or protocol extension in C++ is something like the code below, which mirrors an approach used by some component systems:

#include <map> #include <string> #include <iostream> // ---- Generic stuff -------------- class InterfaceBase { public: virtual ~InterfaceBase() { } }; class Object { typedef std::map<std::string,InterfaceBase *> IfaceMap; IfaceMap ifaces_; protected: template<typename T> void addImpl(T * ob) { ifaces_.insert(std::make_pair(typeid(T).name(),ob)); } public: template<typename T> operator T *() { IfaceMap::const_iterator i = ifaces_.find(typeid(T).name()); if (i == ifaces_.end()) return 0; return dynamic_cast<T *>(i->second); } }; // --------- Interface declaration class FooInterface : public InterfaceBase { public: virtual void foo() = 0; }; // ---------- Specific interface implementations class FooImpl : public FooInterface { public: void foo() { std::cout << "foo" << std::endl; } }; class FooObject : public Object { public: FooObject() { addImpl<FooInterface>(new FooImpl()); } }; int main() { FooObject ob; FooInterface * foo = ob; foo->foo(); }

The downside of this should be fairly obvious: It is painfully wasteful for small "objects". It's not really type composition at all, merely some syntactic tricks to make the fact that we're dealing with a container of disparate objects.

A way to reduce the wastage for the above approach might be to make sure we only allocate one block of memory only. It's not particularly difficult - we'll need a factor which will need to know which interface implementations to use, and can determine the binary layout of the object, and then create a mapping table, much like the vtable used for virtual methods.

The factory can then allocate the memory and use placement new to create the objects.

The obvious issue with this method is that if you use malloc to create this object, it can't be deleted with the normal delete operator, and if you use the array version of new you have to remember to use the array version of delete.

One workaround is to always wrap them in a smart pointer that knows how to deal with them, or use a garbage collector - otherwise you have to explicitly call a function that knows how to deal with the composed objects.

A better approach is this:

template<int Size> class Object { protected: char data[Size] buffer; // ... the rest... };

So instead of inheriting from Object the class, new types will inherit from Object<SomeCalculatedSize> the template instantiation.

The next refinement here is probably to use typelists with a lot of template magic in order to simplify the creation of the composed types.

Refining the beast

But it's still inefficient: It contains a list of pointers to interface implementations in each object. And it's also cumbersome to use: You can cast TO an interface, but you can't cast FROM an interface to the base object.

The first problem is "easy" to take care of by implementing our own "vtable". Instead of containing just a flat buffer, each Object contains a pointer to an array. Each type implementation can initialize each new object with a pointer to a type specific vtable that gives the offsets from the start of the object to the specific interface implementation.

The second is more problematic. One way of doing this is to put a pointer back to the base of the object at a fixed offset from each interface implementation, but that means wasting an additional pointer for each implemented interface.

Another approach is to use a "fat pointer" - a smart pointer that contains extra state, for instance a pointer to the base object. This saves us extra pointers in the object itself, but double the size of our "pointers", which could be really painful in some applications.

A third approach is to use a specialized smart pointer for each interface that will "cast" an internal pointer to the object to the specific interface implementation pointer on each call. This saves memory at the cost of at best an extra table lookup for each call. This is more or less the approach chosen for the Protocol Extension paper referenced above. However in that case the table contained all methods, not interface implementation types.

A table lookup is only feasible if the number of interfaces used is reasonably small and you can accept allocating a table large enough for all interfaces in the system for each object, though. If not, you'll incure the cost of a hash table lookup instead.

None of these options are particularly attractive. One possibility is supporting more than one of them: Allow casting to a "cheap" smart pointer but not back, and provide a fat pointer alternative if you need to be able to cast both ways.

In the end this probably boils down to application specific needs. But my quest for a method that's simple AND fast enough for general use continues...


blog comments powered by Disqus