The Programmer's Paradox: Artificial Complexity

Thursday, February 20, 2020

Artificial Complexity

Originally I think it was Frederick Brookes that coined the term ‘accidental’ complexity as a means of expressing all of the additional complexity that exists on top of the base for a problem. This is similar to Einstein's ‘as simple as possible, but no simpler’, which in itself is another way of decomposing the 2 underlying types of complexity.

Really though, any additional complexity is hardly an accident. People, to different degrees, are not always able to see what is simplest. Calling that an accident is nice, but I prefer just labeling it as ‘artificial’ complexity, as it sits on top of the ‘real’ complexity of the problem. It’s there, we should not be concerned with ‘why’ it is there.

Once we get through those definitional issues, it leads to a sense that we might be able to get down to a minimal complexity. Since there are always a large number of different decompositions, with different trade-offs, its best to stay away from an absolute ‘minimum’ concept. Given any problem, and enough time, work, and knowledge, we can work hard to reduce a lot of artificial complexity from a solution; the key here is that removing such complexity should not in any way change the proposed solution. It remains the same. If something is removed, and the solution changes, then it was intrinsically part of the base complexity and the new solution is a different one from the original.

We can tighten this down in programming by coming close to Kolmogorov complexity. That is, we can talk about 2 programs that behave identically, but are different in size. We have to also consider the property of ‘readability’, in that if one shrinks the code in an overly clever way, it may be smaller but at the cost of diminishing the readability. That might be fine for the next release, but it puts up an obstacle for future development, so we only want to shrink the code in ways that don’t lower the readability.

Basically, if one program has dead code in it, then getting rid of that is good. It obviously doesn’t change the behavior. If there is a readable way of rearranging the syntax, that is also good. If there is a way of rearranging the statement/function structure that preserves the behavior, but does a better job of organizing the code, either by defragmenting or flattening it, then that is also good.

Reducing all of the variables to 1 character names or acronyms would be bad, as would squashing the logic with syntactic tricks. Dropping error handling, or switching to a heuristic instead of using a slower algorithm both fall into the bad category as well.

Collapsing a large number of special cases into some generalized case, without significant adverse effects on performance would be good, in that there are now fewer pieces to the puzzle to worry about. That does come at the external cost of having to communicate that abstraction somehow, but that is often more of a staffing issue than a construction one. Encapsulating does increase the complexity, but it also isolates it, so the context becomes manageable.

As it is with code, the same principles hold true for the data as well. We can rearrange the model, in terms of types, structure, and constraints, to remove artificial complexity from its persistence and traveling between the components of the system. This type of normalization is well understood with respect to relational algebra, but it can be reapplied against any formal system, including code if size and other properties like readability are removed.

Since software is a map between the informality of the real world and the formal mechanics of a computer, the map itself is subject to considerable artificial complexity. Most often, in domain-specific code, this is where the worse problems originate. Some of these are rooted in uncertainties, some from lack of knowledge, and some are even rolled along in history. Often they aren’t intrinsically technical issues, but rather misunderstandings of the underlying properties of the domain, like whether or not aspects of it are dynamic.

Since they are so easily over-simplified, the best options are to minimize any constraints, unless there is definitive proof that they are rigid. That is, assume the widest possible circumstances, then move to restrict them gradually as enlightenment is enhanced. That falls afoul of impatience, but it tends towards a more efficient long-term approach if all things are really taken into consideration, like the time a solution can remain viable, without major modifications. Getting that relationship backward is a self-fulfilling prophecy in that excessive artificial complexity can cross a threshold that quickly diminishes any value in the work, leading to a ‘why bother’ conclusion. Line by line, we’ve seen enough code that is decades old to know that we frequently underestimate the possible lifespan of our work. Quality, battle-tested and leverageable code when it aligns well with the real world is sticky. More of that is obviously more efficient. It’s a write once, use forever philosophy.

Adding artificial complexity is really expensive in the long run, but it often seems like a cheaper answer in the short run, mostly because full consideration isn’t given. For people who have seen minimal complexity, it’s an obviously better choice, but since we so haphazardly throw artificial complexity everywhere, that experience isn’t the norm.

The Programmer's Paradox

Thursday, February 20, 2020

Artificial Complexity

No comments:

Post a Comment