Thursday, March 30, 2023

Strong Coding Habits

It may be easy to dismiss all coding standards and conventions as subjective, but it’s not actually true.

No matter what you do in code, it has an effect and a bunch of side effects. If you balance out your work to align the side effects, they tend to collect together and really help make things better overall.

We’ve often seen this with popular trends in programming; that one generation does things a particular way, then the next generation, seeking to differentiate themselves, arbitrarily changes that. Oddly, the newer, more popular way of doing things isn’t always better, in fact maybe 50% of the time it is worse. The takeaway here is that just because doing something is popular right now, it doesn’t mean doing it that way is a good idea. You have to be objective about the work and why you are doing it. You shouldn’t just blindly follow some stupid trend.

With that in mind, I wanted to talk about some of the older, more defensive ways of coding, and how they really help you to write code that tends towards better quality.

The first thing comes from some older philosophies about pre-conditions and post-conditions. I think it was related to aspect-oriented programming and design by contract, but we had been using it long before those ideas surfaced. It’s a stricter version of Procedural programming, really.

The start is to use lots of functions. Lots of them. Keep them small, keep them tight. Every function should just do one thing and do it well.

The primary reason that this is a good idea is that when you write some code, it is very unlikely that you’ll just write it once, perfectly, and it will hang around forever. So, instead, we need to assume that our first draft is nothing more than that. It’s just a roughed-in version of the code, and over time we’ll need to make it a lot better. Add better error handling, enhance its capabilities, etc. Code isn’t something you “write”, it is something you “evolve”.

If you have a lot of little functions and each one encapsulates something useful, then later when you are playing with the code or extending it, or dealing with unwanted behaviors, it is easier to work at a slightly higher level over and around the pieces, aka the functions.

Maybe you thought the 3 things you needed to do were order independent, then found out later that you put them there backward. If all 3 things are interleaved into a giant mess, that is a lot of work to fix it. If it’s just calling 3 functions, then it is trivial.

So, we want that malleability, at the very small cost of executing a lot of different functions. Basically, we can see functions as some trivial syntax that wraps what you are doing. Well almost.

Cause you now have to come up with 3 unique function names, but that is actually a good thing. You know what the 3 steps are -- you can explain them to your friends -- so what words would you use in that explanation? Those words are the roots of your names, just make sure you squish them as hard as you can but don’t lose their meaning. If you do that, and the names match the code, then the names are self-describing, and that helps later when you revisit the work.

Inside of the functions, you generally have a bunch of input variables and one or more output ones. The best functions are effectively stateless, that is, everything that they touch is passed in through their arguments. They do not reference, or fiddle with anything that is global, or even scoped beyond their borders. That is a little misleading in the OO world, in that they should be touching class variables, but even there you can be tight on their behavior. For instance, some of the functions just need the class variables for read-only usage. Some might write, or update them. Keep those types of functions separate from each other.

In a function, it is possible that some of the incoming data is crap. So when we are running code in development, we really do want it to stop as early as possible on bad data. That makes it easier to figure out what went wrong, you don’t have to go backwards through the stack, and find the culprits, it was just “one” bad call.

In production, however, for some types of systems, we want to be liberal. That is, the code should blunder through the work if possible. We do that so that bugs that escaped into production can be worked around by the users. If the code stops on the first data problem, it’s over. The whole thing stops. But if it kinda works with a bug, then the users have options. It is because of that that we want two very different behaviors for bad data. In dev, it is strict, and in prod it is lenient. A good build environment should differentiate between dev and prod releases at minimum. Then we can switch the behavior as needed.

Then the next thing to do is to wrap the incoming arguments in what they used to call asserts. It is a statement that asserts that the variable only holds very specific values and that the remaining range of values is what the function below it will handle.

So, if you have a variable that could be null, but the code can’t handle nulls, you put in an assert that blows up with a null. Nulls aren’t the best example though, in that it is usually far better to support them, everywhere. So, the assert turns into a terminal condition, often of the form that if one of the inputs is null, the output is also null, or something similar.

Between the asserts and the terminal conditions, the very first part of the function is dedicated to making sure that all of the incoming variables are exactly as what is needed for the code below. That is, they primarily act as a filter, to reject any incoming crap. If the filter doesn’t reject stuff, then the code below triggers. In dev, the filter stops the program, in prod it outs warnings into the log.

Separating out the crappy variable stuff from the rest of the code has the consequence that it makes the rest of the code a lot more simple. Mixing and matching makes it really hard to tell what the code was really trying to do. So, this split is key to being able to read the code more effectively. Often you skip the asserts and other pre-conditions, then see what should have happened in the average case. Then you might look to see if the pre-conditions are strict enough.

Also, if there are incoming globals, which you can’t avoid for some reason, they exist in these pre-conditions too. You get them first, then later you play with them.

So, now you get this tighter code base, it becomes easier to start reusing parts of it. Maybe you are missing some of these smaller functions? Maybe the thing in the function itself is wrong, or a bad idea? You can refactor quite easily, it doesn’t take much effort. If you maintain the stricter coding standards, that extra work pays off hugely when you have to go back later, and you’ll always have to go back later.

There are a lot more strong habits. It’s essentially an endless pursuit. Lots of this stuff was known and forgotten, but really is effective in reducing bugs. The key point is that every programmer makes mistakes, so we should expect that. When we are building big stuff, in order to get decent quality we have to lean on good habits, technologies, different types of testing, and even branches of computer science like proof of correctness. It all depends on how much quality we need since except for good habits it is fairly expensive. Over the last few decades, the quality of work has been steadily falling, but given our world’s dependence on software, people will eventually demand better quality.

No comments:

Post a Comment

Thanks for the Feedback!