Thursday, November 11, 2021

Non-destructive Refactoring

In order to evolve code, over increasingly sophisticated versions, you have to learn and apply a fairly simple skill for any language or tech stack.

It’s called non-destructive refactoring.

Although it has a fancy name, refactoring is really just moving around your lines of code and changing the way you partition them with abstract language concepts like functions, methods, and objects. There are lots of good books and references on refactoring, but few of them do a really good job at explaining why you should use the different techniques at different times.

The non-destructive adjective is a directive to constrain the wholistic effect of your changes. That is, you take an existing piece of working code, apply a bunch of refactoring to it, and when you are done the code will work 100% exactly as it did before. There will be no changes that users or even other programmers can or should notice.

All that has happened is that you’ve kept the same underlying instructions to the computer, but you have just moved them around to different places. The final effect of running those instructions is exactly the same as before.

A huge benefit of this is that you can minimize your testing. You don’t have to test new stuff, you just have to make sure what is there now is indistinguishable from what was there before.

If you are thinking that this is pretty useless, you are tragically mistaken.

The intent in writing code generally starts out really good. But the reality is that as the deadlines approach, programmers get forced to take more and more icky shortcuts. These shortcuts introduce all sorts of problems, that we like to refer to those as technical debt.

Maybe it’s massive over-the-top functions that combine too much work together. Maybe it’s the opposite, the logic is chopped up into tiny pieces and fragmented all across the source files. Maybe it is a complete lack of error handling. It doesn’t matter, and everyone was their own unique bad habits. But it does mean that as the code gets closer to the release, the quality of the code you are producing itself degenerates, often rapidly and to pretty low standards.

So, after a rather stressful release, possibly the worst thing you can do is just jump right back in again and keep going from where you left off. Because where you left off was hacking away badly at things in order to just toss the code out there. That is not a reasonable place to restart.

So, instead, you spend a little bit of time doing non-destructive refactoring.

You clean up the names of things, move code up and down in the function stack, add comments and decent error handling, etc. You basically fix all of the bad things you did right at the end and try to get yourself back to the code you wanted, not the code you ended up writing.

If you blurred the lines between layers, you correct that. If you skipped adding a layer, you put it in. If you added way too many layers, you flatten them down somewhat. You stop hardcoding things that would be nice to change and put them into cleaned-up configuration files. You document the things you missed.

If you have multiple copies of the same function all over the place, you pick one or merge the best parts, then you spend the time to point all of the other code back to that correct version. That’s one of the few places where this type of work isn’t truly non-destructive. One copy of the function maybe have been incorrect, you are rather accidentally fixing a bug or two by repointing it to something better. Because of that, you need to go up to everything that called that function and double-check that it was really calling it properly.

If you know that your next big work is building on top of your foundations, you rearrange things to make that work easier but restrict it to only changes that are non-destructive.

You take the time to do the right thing and to attend to all of the little details that you had to forget about earlier. You clean up your work first before you dive back in again.

If you keep doing this, even if you never seem to find enough time to do all of the work you know you should do, you will find that gradually, cleanup session after cleanup session, that you are moving in the right direction, and that the more often you do this, the less work you end up having to do for each release. That is, really messy code imposes a massive amount of friction which slows you down a lot. That friction is gone if the code is clean, so any work to get the code cleaner is also working to save you time and pain in the not too distant future.

Once you’ve mastered non-destructive refactoring, the other major skill is to extend what is already there (which is pretty clean cause you’ve made sure it is) instead of just bolting on more junk to the side. That is another super-strong habit that really helps to keep development projects ‘in control’ and thus make them a lot more fun to work on.

Sunday, November 7, 2021

It’s Slow, We Should Rewrite it

Whenever a programmer says “it’s slow, we should rewrite it” there is probably like an 80% chance that that programmer doesn’t know what they are talking about. As a refrain, it’s pretty close to “works on my machine” for professional malfeasance.

The first issue is that X amount of work cannot be done in any less than X amount of time. In some super cool rare situations, one can achieve a logarithmic reduction of the work, but those types of optimizations do not come easily. It might be the case that the code is de-optimized, but it would be foolish to assume that without actually doing some performance testing.

The driving issue though is usually that the person making that suggestion does not want to actually go and spend time reading the older code. They like to write stuff, but they struggle to read it. So, as their offensive argument, they picked some attribute that is visible outside of the code itself, proclaim that that is the real problem, and use that as their justification to just throwing away someone else's work and starting over from scratch, without bothering to validate any of their assumptions.

Sometimes they have arrived at that argument based on a misunderstanding of the underlying technologies. They often assume that newer technologies are inherently better than older ones. Ironically, that is rarely the case in practice, the software crisis dictates that later programmers understand less of what they are actually doing, so it’s less likely that their repeated efforts will be better in general. It is true that there is some newer knowledge gained, which might feed into improved macro or micro optimizations, but those benefits can be lost to far more de-optimizations, so you can see why that is a really bad assumption. On top of all of that, the software industry has been rather stagnant for actual innovations for a long time now, most supposedly new technologies are just variations on older already existing themes. It just cycles around endlessly these days. Whatever is old is new again.

With all of that added up, you can’t say that an implementation in tech stack A would be faster or better than one in B. It’s not that simple. That’s been true for decades now. There are some pretty famous cases of people going right back to older technologies and using them to get far better results. The tech stack matters for other reasons, but usually not for performance or quality.

About the only thing you can say about one implementation is that it is a whole lot messier and disorganized than another. That the quality of work is poor. It’s just a pile of stuff hastily thrown together. But you cannot know that unless you actually dive in and read, then understand the code itself. You can’t look at 5% of it and draw that conclusion. And any outside behaviors are not enough to make those types of assertions.

Overall it is rarely ever a good idea to rewrite software anymore. There are times and situations when that changes, but it hasn’t been the default for a long, long time. The best alternative is to start refactoring the code so that you keep all of the knowledge that has already built up in it, and learn to leverage that into something that exceeds the scope and quality of the original code. You can’t do that by refusing to read it, or by ignoring the knowledge that went into it. If the code was in active development for a decade, then rewriting it would literally set you back 10 multiplied by the number of programmers involved over that period. Which is a huge backslide, and highly unlikely to be successful in any time frame. It takes an incredibly long time to slowly build up code, so even if it isn’t perfect, it represents a lot of time and knowledge. You need to mine and leverage that work, not just toss it blindly into a dustbin. It’s quite possible that the coders who wrote that original work were a lot better at their jobs than you are at yours.