Friday, June 15, 2012

Globals and State

One of the great lessons learned -- long ago -- was that making variables ‘global’ in programming code was just asking for trouble. It is of course, easier to write the code with globals; you just declare everything global then fiddle with it wherever you want. But that ease comes at the rather horrendous cost of trying to modify the code later. Get enough different sections of code playing with the same global and then suddenly it is very complicated to ascertain what any little changes to the variable will do to the overall system. So we came to the conclusion that these ‘side-effects’ were both very expensive and completely undesirable. A lot of effort went into making it possible in most modern languages to batten down the scope for everything, variables, functions, etc. We turned our attention towards stuffing as much as possible into ‘black boxes’ -- encapsulation -- so that we minimize its interaction with the rest of the code base.

Another lesson often learned was that stateless code was considerably easier to debug than code that supported a lot of internal states. If you execute the code and each time its behavior is identical, it is fairly straightforward to determine if it is correct or not. If however, the behavior fluctuates based on changing internal state information, then testing becomes a long and drawn out process of cross-referencing all of the different inputs with the different outputs (a task usually short-changed leading to corner-case bugs). Test cases become complex sequences that are both time consuming and hard to accurately reproduce. Simple tests for stateless code means less work and better quality.

State changes can come from internal modification of variables, but they are most often triggered by things external to the scope of the code. Thus, function A modifies some state information, so that the behavior of function B changes. Generally the call to function A comes from somewhere on the outside of function B’s code block. This essentially forms an indirect reference to the state for function B, which relies not on a global variable, but rather a function that could be accessed globally. A global function. When we banished globals we did so for static variable declarations, however a code-based dynamic call is essentially the same thing. In a very real sense, any part of the program that is subject to changes either directly, or indirectly, that originate from other parts of the program is some type of global. Global data or global action, it doesn’t matter.

Ideally to make everything easily testable we’d like 100% of all arguments explicitly pushed into every function, and to support changes within the system we’d like 0% side-effects, so everything changed is returned from the function. Global-less, stateless code.

Often in APIs there are a large number of different primitives available. Different users of the API will access these subsets of functions in many different orders. In most Object Oriented (OO) languages this is handled by using something equivalent to set/get methods to alter the state of internal private variables, which other primitives use as values in their calculations. However, these methods are only available if the object is within the scope of the caller, so it has the effect of  constraining their usage. So long as the object is not global, the methods are not either. The object becomes a local variable, interacting with it can be in any order necessary. However you can violate this easily by either setting the method calls to static or by creating the object as a Singleton. Either way introducing a global effect.

Another way to mess with things is to have the internal data as a reference to an object that is outside of the scope. When changes to that underlying object can occur anywhere in the code,this is another form of global manipulation.

In most systems, particularly if there is an interface for users, there is a considerable amount of mandatory state. Basically the computer is used to remember the user’s actions so that the user doesn’t have to keep supplying all of the contextual data over and over again. Depending on how the surrounding session mechanics interacts with the underlying technology, this can leave a lot of little pieces of required state laying all over the code. This of course is a form of spaghetti (variable-spaghetti) and it can be quite nasty because it gets placed everywhere. Cleaning this up means collecting all of the state information together into a single location for any given technical context. So for instance, in a web app there is likely one big collection of user state information in the browser, and another collection associated with the user’s session in the server. That’s fine and considerably better than having a huge number located in both places.

Long ago we identified that global variables were a big problem and in many circles we banished them. But I think we focused too hard on the ‘variable’ part, and not enough on the ‘global’ aspect. Global anything is a potential problem, anywhere. Like disorganization and redundancies, it is just a pool of gasoline waiting for a match. Software systems can be composed of an outrageous amount of complexity, and the only way to effectively deal with it is by encapsulating as much as possible into well-organized sub pieces. If you break that encapsulation... well, you are pretty much right back to where you started ...