The Programmer's Paradox: Defensive Coding: Minimal Code

Sometimes you come across really beautify code. It’s clear and concise. It’s obvious how it works. If you have to edit it, it is intuitive where the changes should go. It looks super-simple. It’s a great piece of work.

Most people don’t realize that getting code to look super-simple is a lot of effort and a huge challenge. Just splatting out any initial version is ugly. It takes a lot of thought, refinement and editing work to get it looking great.

All code degrades with time and changes. If it starts out good, it will get tarnished but should hold its value. If it is ugly on day one, it will be a pit of despair a year later.

One way of approaching the problem is to equate super-simple code with the act of minimizing some of the variations until we come down to one with reasonable tradeoffs. We can list out most of these variations.

Minimize:

The number of variables
The length of a ‘readable’ name
The number of external jumps needed in order to understand the code
The effort to understand a conditional
The number of flow constructs, such as if statements and for loop
The number of overlapping logic paths
The number of hardcoded constants
The number of disjoint topics
The number of layers
The number of reader’s questions
The number of possible different behaviors

We’ll go through each of them accordingly.

Variables

We obviously don’t want to have the code littered with useless variables. But we also don’t want the ‘same data’ stored in multiple places. We don’t want to overload the meaning of a variable either. And a little less obvious, if there are several dependent variables, we want to bind them together as one thing, and move it all around as just one thing.

Readable Names

We want the shortest, longest name possible. That is, for readability we want to spell everything out in its full detail, but when and where there are different options for that, we want to choose the shortest of them. We don’t want to make up acronyms, we don’t want to make up or misused words, and we certainly don’t want to decorate the names with other attributes or just arbitrarily truncate them. The names should be correct, we don’t want to lie. If the names are good, we need less documentation.

External Jumps

If you can just read the code, without having to jump all over the code base, that is really good. It’s self-contained and entirely under control. If you have to bounce all over the place to figure out what is really happening then that is spaghetti code. It doesn’t matter why you have to bounce, just that you have to do it to get an understanding of how that block of code will work.

Conditionals

Sometimes people create negative conditionals that end up getting processed as double negatives. Sometimes people see the parts of the condition getting spread across a number of different variables. This can be confusing. Conditionals should be easy to understand, so when they aren’t they should be offloaded into a function that is. So, if you have to check 3 variables for 7 different values, then you certainly don’t want to do that directly in an ‘if’ statement. If the function you need to call requires all three variables, and a couple of the values passed, you probably have too many variables. The inputs to a conditional check function shouldn’t be that complex.

Flow of Control

There is some minimum structural logic that is necessary for a reasonable computation. This is different than performance optimizations, in that code with unnecessary branches and loops is just wasting effort. So if you loop through an array, find one part of it, then loop through it again to find the other part, that is ‘deoptimized’. By fixing it, you are just getting rid of bad code, but still not optimizing what the code is doing. It’s not uncommon in ugly code to see that a more careful construction could have avoided at least half of all of the flow constructs, if not more. When those useless constructs go, what is left is way more understandable.

Overlapping Logic

A messy part of most programming languages is error handling. It can be easily abused to craft blocks of code that have a large number of different exit points. Some necessary error handling supports multiple different conditions that are handled differently, but most error handling is rather boolean. One can mix the main logic with boolean handling and still have it readable. For more sophisticated approaches, the base code and error handling usually need to be split apart in order to keep it simple.

Hardcoded Constants

Once people grew frustrated by continually hitting arbitrary limits where the programmers made a bad choice, we moved away from sticking constants right into the code. Modern code however has forgotten this and has returned to hardcoding all sorts of bad stuff. On rare occasions, it might be necessary, but it always needs to be justified. Most of the inputs to the code should come through the arguments to the function call whenever possible.

Disjoint Topics

You can take two very specific functions and jam them into one bigger function declaration. The code for each addresses a different ‘topic’, they should be separated, they shouldn’t be together. Minimizing the number of functions in code is a very bad idea, functions are cheap but they are also the basis of readability. Each function should fully address its topic, the code at any given level should all be localized.

Layers

The code should be layered. But taken to the extreme, some layers are useless and not adding any value. Get rid of them. Over minimization is bad too. If there are no layers, then the code tends towards a suer-large jumbled mess of stuff at cross-purposes. It may seem easier to read to some people since all of the underlying details are smacked together, but really it is not, since there is too much noise included. Coding massive lists of instructions exactly as a massive list is the ‘brute force’ way of building stuff. It works for small programs but goes bad quickly because of collisions as the code base grows.

Reader’s Questions

When someone reads your code, they will have lots of questions, Some things will be obvious, some can be guessed at given by the context, but somethings are just a mystery. Code doesn’t want or need mysteries, so it is quite possible for the programmer to nicely answer these questions. Comments, comment blocks, naming, and packaging all help to resolve questions. If it’s not obvious, it should be.

Different Behaviors

In some systems, there are a lot of interdependent options that can be manipulated by the users. If that optionality scrambles the code, then it was handled badly. If it’s really an indecipherable mess, then it is fundamentally untestable, and as such is not production worthy code. The options can be handled in advance by moving them to a smaller set of more reasonable parameters, or polymorphism can be used so that the major permutations fall down into specific blocks of code. Either way, giving the users lots of choices should also not give them lots of bugs.

Summary

There are probably a few more variations, but this is a good start.

If you minimize everything in this list, the code will not only be beautiful, but readable, have way fewer bugs and people can keep extending it easily in the future.

It’s worth noting that no one on the planet can type in perfect code the first time, directly from their memory. It takes work to minimize this kinda stuff, and some of the constraints conflict with each other. So, everyone’s expectation should be that you type out a rough version of the code first, fix the obvious problems, and then start gradually working that into a beautify version. When asked for an estimate, you include the time necessary to write the initial code but also the time necessary to clean it up and test it properly. If there is some unbreakable time panic, you make sure that people know that what is going into production is ugly and only a ‘partial’ answer and still needs refining.

The Programmer's Paradox

Saturday, August 15, 2020

Defensive Coding: Minimal Code