Monday, March 9, 2020

Globals

Sometimes we encounter a bug while running software in ‘production’.

This is unavoidable in that all software is a combination of technical and domain issues; the latter means that the foundations are informal. An aspect of informality is that there are valid things that can occur that are currently unknowable, so it will always be the case that the logic of the existing code doesn’t match reality, and that that drift will ‘bug’ people. All programs will always have bugs, the best ones will have them less often.

When we’ve hit a bug, we replicate it, make code changes and then test them. However, doing a 100% full regression test for all known permutations of any underlying data is extraordinarily time intensive. Without such a test, there is a risk that the current change will break some other part of the program as a side-effect. If the code is tightly scoped, we can manually assert that all of its effects will stay within the scope, and we can manually check the entire scope to ensure that those effects are acceptable.

A trivial example is that there is a typo in a message string to the user. If that string is only used in one place in the code, then any changes to it will only affect that one usage. It’s a pretty safe change.

A non-trivial example is that there is a change to the structure of the persisted data, perhaps combining 2 previously separated fields into one. That change includes some number of deletes and modifications. There are some other lines of code that are explicitly dependent on those changes. They need to be modified to deal with the new schema. But there is also another set of lines of code that is dependent on those lines, and another set dependent on those, etc. going upwards in an inverted tree. It is possible to diligently search through and find all of the effect lines of code, and then assert that there are no unintended side-effects, but for a medium or large program this too is very time intensive. It also requires significant knowledge about how the underlying language is designed, so one can search through all of the likely but not obvious corner-cases. It is far more common that people just take wild guesses and hope that they are correct.

So, if that data, or any data derived from that data can be accessed anywhere in the program, then the full scope of any possible impacts is global. The same is true for any function or method calls. If there is nothing limiting scope, then they are global.

The benefits of making things global is that you can use them anywhere. The downside to doing it is that that convenience makes it nearly impossible to properly know the impact any dependent changes. So, basically it’s a classic trade-off between coding faster now, or being way more confident in changing stuff later.

One way around this is to be very tight on scope for most of the data and code, say 95%. So, if a bug lands on that data or code, figuring out the impact of any changes is contained. For the other code, where it is absolutely necessary to share data and code across the entire system, the syntax is explicit, the structural level is flat.

For example, if you must have a global parameter that is used everywhere, then accessing it is via just one function call. That makes it easy to search. It’s value is the most decomposed primitives, so there should be no other decomposition operations on it. As well, all accesses to the data are read-only, only one encapsulated section of code is allowed to ever modify it. With those sort of higher level ‘properties’, any bug involving this data still needs to be checked across the whole code base, but the checks themselves are constrained to make them faster. You might be able to assert in minutes that changing the value will not cascade into other bugs.

One of the key differences that many people miss with growing code bases, is that the growth of work to assert that a change to globals wouldn’t cause side-effects is at least exponential. That happens because of the inverted tree, but it can also happens because of the language corner-cases. In a small program, you still might have to check a few places to see if there is an impact, but the program is small, so it's often just faster to scan all of the code. When the program gets to medium size, scanning the code is still barely possible, but now it might take days instead of minutes or hours. If a program is large, and has lots of authors, you can never really be sure about what language features they took advantage of, or how they obfuscated some indirect reference to the data, so the exponential increase in basic searching suddenly becomes quite obvious, and scanning all of the code impossible.

Globals are necessary and useful, but they come with a huge cost, so it's always worth spending a lot of time during development to make sure that they are not heavily impacting testing, bug fixing, extensibility, etc. It is one of those ‘a stitch in time, save nine’ issues where the payoffs might not be immediately obvious, but its contribution to failure sure is.

No comments:

Post a Comment

Thanks for the Feedback!