Friday, November 19, 2010

The Myth of Code Reuse

Perhaps the most maligned of all programming techniques is code reuse. The idea is simple: if you’ve written some code, tested it, and then managed to get it battle-tested in real life scenarios, it is far more efficient to spend a little time to generalize that work and then re-apply it for other uses. It leverages your initial effort. It would be even more effective if you did this little extra in advance, knowing that it would get re-used. Why do work over and over again, multiple times, if you’ve already done it once.

There must be at least a trillion of lines of code already out there. We’re writing it faster than ever and there are more people whacking away at their keyboards than ever before (excluding the notorious dot-com bomb blip). Huge effort goes into software development, and yet most of these systems being worked on are either small enhancements of older systems, or just shallow re-writes. Most of the code being written now has been written many times before. Most of the work has already been done once.

Still, with all of that effort being applied and with so many accepted principles being floated around like DRY and “don’t re-invent the wheel”, most of the code in most of the systems I’ve seen -- again and again -- is highly redundant. It’s not uncommon to see a medium sized system with fifty to one hundred screens where each screen although visually similar, has its own unique nearly independent code base. And all of the database access code is equally redundant. In fact, very little is actually shared at all, even though most of it does basically the same thing as an existing piece in the same way. It is normal to see any sizable system with massive amounts of redundant code.

So what is going on? Technically, there is no reason why there should be so much repetition. The computers don’t need it, or even care. The tools exist to avoid it. There is also a lot of good advice suggesting that we don’t repeat our code over and over again. In fact the whole premise behind the Object Orient movement was to encapsulate good coding techniques right into the language to better facilitate better code. Objects weren’t just some arbitrary abstraction, they were intended to push the programmers closer towards encapsulating the related bits of their code with their data.

A key reason programmers give for massive duplication is that they don’t want to waste time over-generalizing the problem. Their fear is that it will become a time sink, crippling the effort while they try to knock the bugs out of very complicated pieces.

But instead, they choose to code as fast as possible with little intelligence. This inherent frailty forces them to constantly revisit each old section, as any new code has quickly made that part of the system inconsistent. With each change, this generates a new slew of bugs. Fast redundant growth manifests itself in bugs and bad interfaces.

I’ve heard of big development efforts with thousands upon thousands of bugs currently being tracked. This type technical debt explosion doesn’t happen with a small tightly wound code base. It is caused by not properly encapsulating the solutions, so that instead of getting a piece done and setting it aside, the developers are constantly forced to re-visit everything. The inevitable inconsistencies driven by rampant duplication force the problems to pop back, over and over again, onto the bug list. Bug fixing grows exponentially, killing off development.

If you put some effort into getting a piece of code right, then it should stay that way if it was properly encapsulated. It only changes if you’re extending that part of the system, otherwise you should be able to trust that it is there and it works. Big bug lists are symptomatic of massive duplication and poor encapsulation.

Another reason that programmers give for avoiding reuse is because they don’t want the code to be too complicated. If it is stupidly simple, it is easier to work on. That’s kind of a funny issue, because most coders are fascinated by complexity and there is a strong culture of trying to not make the work of programming too rote. Many feel it should be unique and creative. Ironically, they struggle to get the freedom to think deeply about problems, only then in practice to just pound out the most thoughtless thing that barely works.

A brute force coding style is a rather mindless activity once you get passed the initial learning stage. You don’t have to think about it, since you’re just splatting out the same basic patterns with the same basic tricks, again and again. Good for schedules, bad for technical debt.

But if you factor in the growing mountain of debt, you quickly realize that any rapid application development that happens will eventually be swallowed by a sea of misery in trying to harmonize all of the little inconsistencies in the reams of redundancy. And, since the pressure to grow will only get worst as the work progresses, the project is destined to crash into a wall, or get canned, or just get re-written again for the third or forth time. Rapid progress is only good if it isn’t also generating rapid technical debt. And usually it is.

There are lots of other excuses programmers give, but I actually think the real underlying reason reuse fails in practice is simple. Programmers don’t like reading other programmer’s code. Few that I know will just drop into some other code base and try to make sense of it. Most would rather struggle on with incomplete documentation or by random experimentation, then just pull up the source and read how it works. Code is viewed as mysterious, impenetrable.

It’s pretty funny because the ability to read the code was the foundation of the OpenSource movement. Richard Stallman wasn’t trying to de-value the price of software, rather he was just frustrated by not knowing what was going on below. The Open in the name references the ability of any dependant programmers to get access to the source code, not to be able to use the stuff for free. But like all great ideas in software that started out OK, once it hit the masses its simple concept became badly diluted and derailed.

It is made worse because most programmers are essentially self-taught. Schools don’t know how to teach programmers and very few coders have ever had a mentor that they respected long enough to be able to learn from. Because of that, most programmers tend towards an eclectic programming style. They get some of the basics down and then they close their minds, thinking that what little they know is all they should know.

Spending all day in black and white logic has a bad tendency to make people try to apply that rigour to the rest of the world around them even when they know it doesn’t fit. It’s an ever present programmer’s disease. You can really see it if you get a programmer talking about something very gray like politics. You’ll often find they have a very closed, hard, over-simplified view that they insist is the absolute truth, and that they can’t understand why everyone just can’t see it as such. A closed mind, with limited, but highly sticky information and far too much faith that it is both correct and complete is a common trait in the industry.

And it seems that it is exactly this eclectic set of beliefs that not only fuels the silly arguments on the web, but also keeps many programmers from actually learning from other’s work. Programmers have a startling lack of desire to learn more programming techniques. There are what, over a million programmers out there working professionally, but the markets for books and blogs is tiny. There are all sorts of blogs that show how to use a few library calls, or gossip about the big companies, but there are few that dare to try and teach good coding style. There is an entire massive industry of consultants dedicated towards finding “fun” things to do while programming, without any concern at all for the “right” things to do. It has become an industry of poorly trained programmers who are happy to be that way.

If you go back a few decades there were far more people interested in what makes programming tick and how to build reliable systems. Donald Knuth’s landmark epic about all of the data-structure techniques wouldn’t even raise an eyebrow today. We share little fragments, perhaps clever tricks but we’re no farther along these days in understanding what really helps in building complex things, and with complexity rising fast, more and more young programmers know less and less about what really happens underneath their instructions. There is a fundamental disconnect. The software crisis that started fifty years ago has turned into a full blown pandemic.

At the root of this the real problem remains. Programmers should read code. Lots of it. They should be actively seeking to improve their code reading skills by examining what others have managed to do. They should scan existing code bases, and spend some time thinking about what works and what doesn’t. What is readable and what is horrible. They shouldn’t need a tour guide to examine an algorithm, nor should they rely on some hastily written summary of the internals. Instead, the only real way to learn, the only real documentation, is the code. It speaks for itself. For all it has to offer and for all of its flaws.

The only way to become a better programmer is to learn from the experience of others, and the only real way to do that is be able to read their code. Once a programmer gets over this hurdle, it is just a small sprint to being able to share code with others. There is no need for documentation or PowerPoint slides. There is no need for huge essays or out-of-date reference material. The more code you read, the more styles you see, the more abstractions you encounter, the easier it gets. You don’t have to spend months re-inventing the wheel, or countless hours just hacking at something. You have the ability, if you have access to the source, to work through the issues yourself. It is a hugely powerful skill-set. And if you work on a team of programmers who all can read each other’s code, then they can share their knowledge, experience and most of all the code itself.

Reuse is possible. I have seen it may times, always employed successfully. It not only cuts down on testing, but it also enforces a necessary consistency at the higher levels that make the tools more usable. Programmers find all sorts of reasons why they can’t do it, even though it is clearly one of the most important skills that they can learn. Not only that, but it also appears as if it is being done far less often by each new generation of programmers. Most programmers are hoping that the rising tide of external code, in the form of libraries and frameworks, takes care of the issue for them. However most of these bits of code only focus on the smaller, naturally tighter, and usually more fun to code technical issues. Reusing technical code is good, but the real value comes from doing it at the domain and application level. After all, there is far more duplication in the application screens and the database access then there is in the small part of the program that does something specialized like reading JPEG images. The best place to apply reuse is the one that sees it the least often.