Wednesday, August 21, 2019

Bugs as a Reflection of Coding Issues

A long, long time ago I read a book about the most popular programming errors in C. Off-by-one was in the top ten.

I have made thousands of off-by-one bugs in my career over the decades. It is easily my biggest mistake, and you’d think knowing that I would be able to reduce the frequency of them.

I suspect that the problem comes from the way I see code. At the point that I am focussed on the higher-level mechanics and flow, I easily ignore the low-level details. So, if I am coding a complex algorithm that is manipulating lots of arrays, whether or not they are symmetric or asymmetric bounds is not particularly interesting. Depending on the tech stack, it's not uncommon to see too much switching between the two, and when I am clearly not paying attention, I run a 50/50 chance of getting it wrong. Which I do, a lot.

Now, in knowing this, I am not able to change my coding mindset, but that doesn’t mean I can’t correct for the problem. So I do. When I am testing to see that the code I’ve written matches my understanding of its behavior, one of the key places I pay attention to is the index bounds. So, if there is an array, for example, I need to add in at least a few examples, then remove a few. That is one of the key minimal tests before I can trust the work.

As a consequence, even though I make a large number of off-by-one bugs, it is actually very rare for them to get into production. If they do, I generally take that as a warning sign that the development process has at least one significant problem that needs to be addressed right away.

Generalizing, because we model the solutions in our head, then code them into the machines, that code can be no stronger than our internal models. Well, almost. For a large set of code that has been worked on by many different programmers, the strength is an aggregate of the individual contributions and how well they overlap with each other.

What that means is that you could segment each and every bug in the system by its collective authors and use that to refine the development process.

So, for example, the system is plagued by poor error handling. That’s an attribute of not enough time or the developers not considering the real operational environment for the code. Time is easy to fix, but teaching programmers to look beyond the specified functionality of the code and consider all of the other possible failures that can occur is tricky. Either way though, the bugs are providing explicit feedback into issues within the development process itself.

It’s actually very similar for most bugs. As well as being the problems, they shed light on the overall analysis, design, coding, and testing of the system. A big bug list for a medium-sized project is an itemized list of how the different stages of development are not functioning correctly. If, for example, the system is doing a poor job with handling the workflow of a given business problem, it’s often due to incomplete or disorganized analysis. If the system can’t keep up with its usage, that comes from technical design. And of course, if the interfaces are awkward and frustrating the user than the UX design is at fault. And of course, stupidly embarrassing bugs getting out to product are instances of testing failure.

The way we build software has a profound effect on the software we produce. If we want better software, then we need to change the way we build it, and we need to do this from an observational perspective, not just speculative opinions (since they likely have their own limited-context problems).