Inheritance or configuration options has a cost in terms of increased complexity that can in some cases with advantage be avoided by maintaining multiple versions of the component and adding new features to new branches instead of continuing to work on a single code base, in the same way integrated circuits often exist in a wide range of similar, static, models with the same basic functionality. Better merging support in modern version control systems make this model increasingly viable for software.
One thing I've thought a lot about over the year is why software reuse is so hard.
A big problem is that designing reusable software when you don't know where it might be reused is
hard
Over the years, a number of people have brought up integrated circuits as a model for software reuse. I tried to find one of the old articles I read about it this morning, but was unfortunately unable to track it down. But this is not an original idea.
Simple ICs have a number of properties that affect how they are used:
- When they're "complete" they're often never changed other than possibly to fix problems. The design may evolve, but the next "version" tends to be given a new designation and is often treated as a separate product.
- There's often a myriad of different versions with smaller or larger differences - many products exists in variation rather than being configurable. Configurability often adds complexity. In hardware, complexity has a very visible impact.
- Apart from very large complex general purpose processors, most ICs tend to have a very high cohesion, because they have to in order to make financial sense.
- They are "black boxes" in that you can't (or won't) change them, but the details of how you interface with them and how they will respond is well documented and wel understood.
In the software world, meanwhile, we keep trying to design reusable libraries, components and services, and a lot of the time we end up with incredibly complex APIs, because we try to prepare for every eventuality.
The result is both more code (that needs to be tested, and that takes up memory) and abstractions that aren't needed for the core functionality, but that needs to be there to facilitate the configurability (which may add significant overheads).
Why should we put up with this?
Distributed version control and reuse
One thing that struck me this morning was that one of the big features of distributed version control systems promise is to ease the burden of merging, and that this is a major stepping stone towards a simpler model of reuse.
First of all, let me say that I am not against configurable components. I strongly believe in making classes and libraries generic and reusable in itself - specifically by ensuring
low coupling and high cohesion. However, sometimes making a component highly flexible comes at the cost of reducing cohesion, of making the component try to please everyone at the same time by exposing interfaces that requires massively increased complexity in order to avoid exposing internal implementation details, or where the choice is taken to "surrender" and expose the guts of the component for everyone to hook into.
Both alternatives are bad.
The "software IC" idea taken to it's ultimate conclusion is this:
Develop strongly cohesive components that export generic interfaces to ensure loose coupling, and "freeze" those components - refuse to add any more features or make any interface changes or adapt it. Limit changes to internals that don't change the observed behavior other than fixing bugs and improving performance characteristics.
It's both incredibly powerful, and at first glance incredibly limiting.
Powerful because it means that when you learn a specific "model" of a component, you have every reason to believe it won't break on you. Imagine linking to the same specific version of a library and never upgrading other than selectively for bug fixes.
Incredibly limiting because software people have a
feature fetish. We crave adding functionality, and go all "ohh, shiny" whenever we see something cool has been added. And that's fair, at least when it actually is helpful.
I don't want to stop that. I want to take a much more conscious approach to the fact that when DingbatShell goes from version 1.x to 2.x it's a different model - a different product - than the previous version. Upgrading, even if the API seems to stay mostly backwards compatible, requires new rounds of testing and careful review.
Software IC's aren't new - they're called branches and versions
This is the crux of the matter. You've been able to do this "forever" - and some have done. But very few take the conscious approach that this applies to the whole stack, including third party libraries, build tools that have any kind of effect on the final product etc.
Even fewer extend this to creating a multitude of branches - a new branch for every major "niche" the component is meant to work in, or every major axis of configurability.
A key reason being that in the age of version control systems that have been abysmally bad at merging changes, you
really don't want to have to merge in a bug fix across 42 different versions of a component.
I'm not sure we're still quite there yet, but that's almost what I'm proposing. A vital point being that such changes should be exceedingly rare exactly because you freeze features regularly, branch of new components, and continue new feature development while leaving the branches frozen.
Only for critical bug fixes would you be faced with a potentially massive merge job. But if the components remain small and simple that merge job might not be so bad.
This is of course where the new breed of distributed version control systems comes in. Because they're distributed, better merging has been vital. A system like
GIT is heavily focused around a workflow that for many users involves frequent multi-way merges of a very high degree of complexity.
We're finally getting tools that are actually specifically geared towards managing large number of branches.
What are the benefits?
Whenever there's a high cost to providing configurability, either in increasing complexity or reducing performance due to complex abstractions you have a point where it's worth considering a new "model".
You can:
- Simplify the API - configuration options that are needed for only one or the other axis of configuration (say using a database vs a set of files as the data source, if the nature of the component is such that it's always either or) can be left out entirely. A good test for whether splitting a component into branches is a good thing is to look at how large parts of the API you can prune away or how many arguments you can remove from methods.
- Improve performance by hardwiring logic that might otherwise go through multiple level of indirection.
- Massively simplify testing, because the number of permutations of configurations may drop significantly (look for m*n effects, where configuration happens along more than one "axis" and where many combination may not make sense, but will still need to work - if branching the component can make testing against a single axis at the time it may be a big win).
This doesn't work for end users and not always even for developers
Imagine if an end user had to look through a catalogue to see which model of Gimp had exactly the features they want. It's not going to happen.
This is an approach for developers, and even then, it is an approach for relatively small, highly cohesive, components.
It is not a panacea. It is not always appropriate.
It's yet another tool, and an approach that I personally will start considering more seriously whenever I get to a point where I want to add more features.
But are you using it?
I have
ranted about why I don't like frameworks before, and
written a number of small Rack handlers for example, and the direction I'm increasingly taking for web development is to compose applications of small components designed to be extremely small, cohesive and loosely coupled.
That's the ideal scenario for "software ICs". Rather than adding more features to my dispatch class that I will quite possibly only use for a fraction of the apps I write, I will make a branch, add those features to separate branches, and pick and choose, keeping whichever version I pick for a new web app extremely simple. If I need a feature from another "model", I'll make another branch from whichever version is the closest match, and merge in the feature I want.
Currently, the dispatch class I use for this blog, for example, is about 20 lines. It doesn't need to be more. I just butchered it and removed most of the features it used to have because I realize that for this app those features were just cruft and bugs waiting to happen. I keep the code around - when/where I do need it, it's there in my repository.
The same is true for other components I use.
It reduces the need for documentation, even, because untangling the features from eachother have resulted in components that are so trivial to understand that the code does as good a job as a description, and is guaranteed to be much more precise.