Thursday, October 10, 2024

Avoidance

When writing software, the issues you try to avoid often come back to haunt you later.

A good example is error handling. Most people consider it a boolean; it works or does not. But it is more complicated than that. Sometimes you want the logic to stop and display an error, but sometimes you just want it to pause for a while and try again. Some errors are a full stop, and some are local. In a big distributed system, you also don’t want one component failure to domino into a lot of others, so the others should just wait.

This means that when there is an error, you need to think very clearly about how you will handle it. Some errors are immediate, and some are retries. Some are logged, others trigger diagnostics or backup plans. There may be other strategies too. You’d usually need some kind of map that binds specific errors to different types of handlers.

However, a lot of newer languages give you exceptions. Their purpose is that instead of worrying about how you will handle the error now, you throw an exception and figure it out later. That’s quite convenient for coding.

It’s just that when you’ve done that hundreds of times, and later never comes, the behavior of the code gets erratic, which people obviously report as a bug.

And if you try to fix it by just treating it as a boolean, you’ll miss the subtleties of proper error handling, so it will keep causing trouble. One simple bit of logic will never correctly cover all of the variations.

Exceptions are good if you have a strong theory about putting them in place and you are strict enough to always follow it. An example is to catch it low and handle it there, or let it percolate to the very top. That makes it easy later to double-check that all of the exceptions are doing what you want. It puts the handling consistently at the bottom.

But instead, many people litter the code indiscriminately with exceptions, which guarantees that the behavior is unpredictable. Thus, lots of bugs.

So, language features like exceptions let you avoid thinking about error handling, but you’ll pay for them later if you haven’t braced that with enough discipline to use them consistently.

Another example is data.

All code depends on the data it is handling underneath. If that data is not modeled correctly -- proper structures and representations -- then it will need to be fixed at some point. Which means all of the code that relies on it needs to be fixed too.

A lot of people don’t want to dig into the data and understand it, instead, they start making very loose assumptions about it.

Data is funny in that in some cases it can be extraordinarily complicated in the real world. Any digital artifact for it then will have to be complicated. So, people ignore that, assume the data is far simpler than it really is, and end up dealing with endless scope creep.

Incorrectly modeled data diminishes any real value from the code, usually a whole collection of bugs. It is the foundation, which sets the base quality for everything.

If you understand the data in great depth then you can model it appropriately in the beginning and the code that sits on top of it will be far more stable. If you defer that to others, they probably won’t catch on to all of the issues, so they over-simplify it. Later, when these deficiencies materialize, a great deal of the code built on top will need to be redone, thus wasting a massive amount of time. Even trying to minimize the code changes through clever hacks will just amplify the problems. Unless you solve them, these types of problems always get worse, not better.

Performance is an example too.

The expression “Premature optimization is the root of all evil” is true. You should not spend a lot of time finely optimizing your code, until way later when it has settled down nicely and is not subject to a lot of changes. So, optimizations should always come last. Write it, edit it, test it, battle harden it, then optimize it.

But you can also deoptimize code. Deliberately make it more resource-intensive. For example, you can allocate a huge chunk of memory, only to use it to store a tiny amount of data. The size causes behavioral issues with the operating system; paging a lot of memory is expensive. By writing the code to only use the memory you need, you are not optimizing it, you are just being frugal.

There are lots of ways you can express the same code that doesn’t waste resources but still maintains readability. These are the good habits of coding. They don’t take much extra effort and they are not optimizations. The code keeps the data in reasonable structures, and it does not do a lot of wasteful transformations. It only loops when it needs to, and it does not have repetitive throwaway work. Not only does the code run more efficiently, it is also more readable and far more understandable.

It does take some effort and coordination to do this. The development team should not blindly rewrite stuff everywhere, and you have to spend effort understanding what the existing representations are. This will make you think, learn, and slow you down a little in the beginning. Which is why it is so commonly avoided.

You see these mega-messes where most of the processing of the data is just to pass it between internal code boundaries. Pointless. One component models the data one way, and another wastes a lot of resources flipping it around for no real reason. That is a very common deoptimization, you see it everywhere. Had the programmers not avoided learning the other parts of the system, everything would have worked a whole lot better.

In software development, it is very often true that the things you avoid thinking about and digging into, are the root causes of most of the bugs and drama. They usually contribute to the most serious bugs. Coming back later to correct these avoidances is often so expensive that it never gets done. Instead, the system hobbles along, getting bad patch after bad patch, until it sort of works and everybody puts in manual processes to counteract its deficiencies. I’ve seen systems that cost far more than manual processes and are way less reliable. On top of that, years later everybody get frustrated with it, commissions a rewrite, and makes the exact same mistakes all over again.

No comments:

Post a Comment

Thanks for the Feedback!