The Programmer's Paradox: Scope

One of the keys to getting good quality out of software development is to control the scope of each line of code carefully.

This connection isn’t particularly intuitive, but it is strong and useful.

We can loosely define the scope of any piece of code as the percentage of other lines of code in the system that ‘might’ be affected by a change to it.

In the simplest case, if you comment out the initialization of the connection to a database, all other lines of code that do things with that database will no longer work correctly. They will error out. So, the scope of the initialization is that large chunk of code that relies on or messes with the data in the database and any code that depends on that code. For most systems this a huge amount of code.

Way back, in the very early days, people realized that global variables were bad. Once you declare a variable as global, any other line of code can access it, so the scope is effectively 100%. If you are debugging, and the global variable changes unexpectedly, you have to go through every other line of code that possibly changed it at the wrong time, to fully assess and understand the bug. In a sizable program that would be a crazy amount of time. So, we came to the conclusion long ago that globals, while convenient, were also really bad. And that it is a pure scope issue. We also figured out that it was true for flow-of-control, like goto statements. As it is true for function calls too, we can pretty assume it is true in one way or another for all code and data in the system.

Lots of paradigms center around reducing the scope in the code. You encapsulate variables in Object Oriented, you make them immutable in Functional Programming. These are both ways of tightening down the scope. All the modifiers like public and private do that too. Some mechanisms to include code from other files do that. Any sort of package name, or module name. Things like interfaces are also trying to put forth restrictions on what can be called when. The most significant scope reduction is strongly typed languages, as they will not let you do the wrong thing on the wrong data type at the wrong time.

So, we’ve known for a long time that reducing the scope of as much code as much as you can is very important, but why?

Oddly it has nothing to do with the initial coding. Reducing scope while coding makes coding more complicated. You have to think carefully about the reduction and remember a lot of other little related details. It will slow down the coding. It is a pain. It is friction. But doing it properly is always worth it.

The reason we want to do this is debugging and bug fixes.

If you have spent the time to tighten down the scope, and there is a bug in and around that line of code, then when you change it, you can figure out exactly what effect the change will have on the other lines of code.

Going back to the global example, if the variable is local and scoped tightly to a loop, then the only code that can be affected by a change is within the loop itself. It may change the final results of the loop computations, but if you are fixing it, that is probably desirable.

If inside of the loop you referenced a global, in a multi-threaded environment, you will never really know what your change did, what other side effects happened, and whether or not you have really fixed the bug or just got lost while trying to fix it. The bug could be what you see on the code or it could be elsewhere, the behavior is not deterministic. Unlimited scope is a bad thing.

A well-scoped program means that you can be very sure of the impact that any code change you make is going to have. Certainty is a huge plus while coding, particularly in a high-stress environment.

There is a bug, it needs to be fixed correctly right away, making a bunch of failed attempts to fix it will only diminish the trust people around you have in your abilities to get it all working. Lack of trust tends to both make the environment more stressful and also force people to discount what you are saying. It is pretty awful.

There were various movements in the past that said if you did “X” you would no longer get any bugs. I won’t go into specifics, but any technique to help reduce bugs is good, but no technique will ever get rid of all bugs. It is impossible. They will always occur, we are human after all, and we will always have to deal with them.

Testing part of a big program is not the same as fully testing the entire program, and fully testing an entire program is always so much work that it is extremely rare that we even attempt to do it. In an ancient post, I said that testing was like playing a game of battleship with a limited set of pegs, if you use them wisely, more of the bugs will be gone, but some will always remain.

This means that for every system, with all its lines of code, there will come a day when there is at least one serious bug that escaped and is now causing big problems. Always.

When you tighten the scope, while you have spent longer in coding, you will get absolutely massive reductions in the impacts of these bugs coming to light. The bug will pop up, you will be able to look at your readable code and get an idea of why it occurred, then formulate a change to it for which you absolutely are certain of the total impact of that change. You make the change, push it out, and everything goes according to plan.

But that is if and only if you tightened the scope properly. If you didn’t then any sort of change you make is entirely relying on blind luck, which as you will find, tends to fail just when you need it the most.

Cutting down on the chaos of bug fixing has a longer-term effect. If some bugs made it to production, and the handling of them was a mess, then it eats away at any time needed to continue development. This forces the programmers to take shortcuts, and these shortcuts tend to go bad and cause more bugs.

Before you know it, the code is a huge scrambled mess, everybody is angry and the bugs just keep coming, only faster now. It is getting caught in this cycle that will pull the quality down into the mud like hyper-gravity. Each slip-up in handling the issues eat more and more time and causes more stress, which fuels more shortcuts, and suddenly you are caught up in this with no easy way out.

It’s why coming out of the gate really fast with coding generally fails as a strategy for building stuff. You're trying to pound out as much code as quickly as you can, but you are ignoring issues like scope and readability to get faster. That seems to work initially, but once the code goes into QA or actual usage, the whole thing blows up rather badly in your face, and the hasty quality of the initial code leads to it degenerating further into an iky ball of mud.

The alternative is to come out really slowly. Put a lot of effort into readability and scope on the lowest most fundamental parts of the system. Wire it really tightly. Everyone will be nervous that the project is not proceeding fast enough, but you need to ignore that. If the foundations are really good, and you’ve been careful with the coding, then as you get higher you can get a bit sloppier. Those upper-level bugs tend to have less intrinsic scope.

Having lots of code will never make a project better. Having really good code will. Getting to really good code is slow and boring, but it will mitigate a great deal of the ugliness that would have come later, so it is always worth it.

Learn to control the scope and spend time to make that a habit. Resiste the panic, and just make sure that the things you coded do what they are supposed to do in any and all circumstances. If you want to save more time, do a lot of reuse, as much as you can get in. And don’t forget to keep the whole thing really readable, otherwise it is just an obfuscated mess.

The Programmer's Paradox

Thursday, April 11, 2024

Scope

No comments:

Post a Comment