Giant balls of typeless source files

Cedric posted this entry about an article by Steve Yegge called Dynamic languages strike back (it's slightly interesting, but nothing really new if you already like dynamic languages, though the section on trace trees is wel worth a read). Most of Cedric's comments were unremarkable, but this made me cringe:

What will keep preventing dynamically typed languages from displacing statically typed ones in large scale software is not performance, it's the simple fact that it's impossible to make sense of a giant ball of typeless source files, which causes automatic refactorings to be unreliable, hence hardly applicable, which in turn makes developers scared of refactoring. And it's all downhill from there. Hello bit rot.

First of all, let me be generous and assume he meant type annotated source files, because most prominent dynamic languages are strongly typed. In Ruby, for example, everything is an object, and every object has a type and a class (note the distinction: A Ruby object has a type that may or may not coincide with just the class, because of the concept of eigenclasses, which allows you to extend a single object without changing the class). This makes the argument slightly more tenable, but not much so. Type annotations, or lack of them, is not a distinguishing feature of static languages vs. dynamic. Many static languages, such as Haskell, can forego most (in some cases all) type annotation because of good type inference engines. But one of the things I love about Ruby is the lack of unnatural type restrictions. In fact, the few times I've come across code with explicit type annotations (using #is_a? etc.) it's usually been a hindrance rather than a help, because the implementor of the class in question made assumptions about what classes could be provided as input that just plain were more restrictive than they needed to be. The "giant balls of typeless source files" just have yet to be a problem for me. First of all, they're not giant balls - I find myself routinely writing far less code to achieve the same goals than what I'd do in most other languages that I know. But more than that, I end up writing code where type largely doesn't matter, and where the small bits of type that do matter is clearly documented. The small bits that do matter are NOT type or class names, but things like this:

Argument "foo" must implement #each to iterate over a collection.
Argument "bar" must support Comparable.

Coupling tends to be far reduced, allowing me to reason about, and test, the code in much smaller units. Which again means that I rarely - if ever - need to even consider "giant balls" of source files at all. Most of my Ruby applications are on the order of a few hundred to a few thousand lines of code, but they build on a shitload of libraries. And while I like having the source to them in case I need a new feature, I can honestly say that I've never looked at the source of most of them for other than curiosity about how they do something. Most of the time, I won't look at source or even documentation. I'll be doing something like this:

$ irb -rmodel
irb(main):001:0> (Item.methods - Object.methods).sort
=> ["[]", "add_hook", "after_create", "after_destroy", "after_initialize", "after_save", "after_update", "all_association_reflections", "associate", "association_reflection", "associations", "before_create", "before_destroy", "before_save", "before_update", "belongs_to", "cache_key_from_values", "cache_store", "cache_ttl", "columns", "create", "create_table", "create_table!", "create_with", "create_with_params", "database_opened", "dataset", "db", "db=", "def_hook_method", "delete_all", "destroy_all", "drop_table", "fetch", "find", "find_or_create", "has_and_belongs_to_many", "has_hooks?", "has_many", "has_validations?", "hooks", "implicit_table_name", "is", "is_a", "is_dataset_magic_method?", "join", "load", "many_to_many", "many_to_one", "method_missing", "no_primary_key", "one_to_many", "one_to_one", "plugin_gem", "plugin_module", "primary_key", "primary_key_hash", "schema", "serialize", "set_cache", "set_cache_ttl", "set_dataset", "set_primary_key", "set_schema", "skip_superclass_validations", "subset", "super_dataset", "table_exists?", "table_name", "validate", "validates", "validates_acceptance_of", "validates_confirmation_of", "validates_each", "validates_format_of", "validates_length_of", "validates_numericality_of", "validates_presence_of", "validations"]
irb(main):002:0>

... and I tend to find what I'm looking for. I have so far not once felt even the slightest need to use a refactoring tool with Ruby. I've never once done a mass renaming of methods with Ruby that spanned more than a single file, and where emacs search replace wasn't more than sufficient. This is an impedance mismatch between the idea what is necessary that just doesn't match reality for a lot of people working with dynamic languages. I don't want a re-factoring tool. It doesn't even place in my top 10 of features or functionality I'd like to have for Ruby, to the point where I've never looked to see if one already exists. Does that mean I don't refactor? No it doesn't. But refactoring when the module you're working on is measured in hundreds of lines and a handful of files, and is meaningfully separate and not coupled to the rest of your code, is pretty trivial, and not something that needs tool support. What it boils down to is that the very need for advanced refactoring tools is a big red flashing warning sign. It means the language has failed in making life easy for you and/or you have failed as a designer. It is one of the reasons I truly loathe highly coupled systems - it's a design smell that I've had to endure enough in the past that I'll go to great lengths to avoid it. The result is better software, with far more reusability and maintainability. And systems without giant balls of source files - typeless or not.

Giant balls of typeless source files 2008-05-13