Thursday, February 27, 2020

How to Shoot Yourself in The Foot

Writing small software programs is easy. Just ‘code and go’.

But this changes dramatically as the size of the code grows.

For any potentially usable software, there is ‘value’. If the software has value -- in then it does what it is supposed to do and it does it reliably -- then people will want to use it. If the value in the software is too low, then they will try to get away from that it.

That can be true even if the system is getting enhanced. Sometimes the new incoming features don’t provide enough new value, while the existing ones stagnate and decay. That’s a great situation in that some people can now claim the system has ‘grown’, but the overall value has really not matched the growing expectations of the users, so they leave anyways.

Going from top-down, the best way to kill value is to add more features into the overall navigation, so that the navigation itself becomes a maze. That tortures people because they know that the functionality is there, someplace, they just don’t have time to search through the whole freakin system to find it.

Non-deterministic features are great too. Basically, if people can’t predict what the code will do next, then they quickly lose their appetite to use it. Closely related to this are systems that are up, then down, then up, then ...

Crappy data quality bugs people as well. Particularly if the system has lots of data, some of it is great, but you can’t tell which and some of it is just tragically wrong. Nothing is more uninviting than having to scroll through endless bad data to find the stuff you are looking for. Useless searching of too much data is helpful in this category too.

While these are all external issues, some products peak early in their lives and then start the slow but steady descent into being awful. Usually, that comes from internal or process issues.

If the code is a mess, testing can hide that for a few versions, but eventually, testing will not be enough and the mess will percolate out to the users. Good testing cannot help with poor analysis, bad design, disorganization or messy code.

In the code, if a large group of programmers each builds their own parts in their own way, then much of their work is both redundant and conflictory. So, letting everyone have total creative freedom over their efforts will eventually kill a lot of value.

Another fun way to trash value is to go wild with the dependencies. Grabbing every library under the sun and just adding them all in quickly becomes unmanageable. As many of the libraries have their own issues, just keeping the versions up-to-date is a mammoth task. What seems like a Precambrian explosion of features, turns into just an explosion.

Within the code itself, any approach that increases the cognitive load without also providing enough added benefits is sure to deplete value. This happens because the approach acts as a resource ‘sink’ that pulls time away from better activities like thinking, communicating, or organizing. Wasting time, in a time-starved endeavor will help exaggerate all of the other problems.

A great thing to do is just to ignore parts of the system, hoping that they will just magically keep working. But what isn’t understood is always out of control, even if it isn’t obvious yet. If enough trouble is brewing, eventually it will overflow.

If the interface is great and the code is beautiful, that still doesn’t mean that the value won’t disappear with time. Building systems involves five very different stages, so ignoring any of them, or trying to treat all of them the same, or even just going at them backward, sets up a turbulent stream of work. Code coming from class VI rapids is highly unlikely to increase value.

The trick to moving a system from being small to being huge is that while doing so, the value increases enough that the users still want to play with the code and data. If you focus on some other attribute and accidentally kill that value, then you now have added a rather large hole to your foot.

Thursday, February 20, 2020

Artificial Complexity

Originally I think it was Frederick Brookes that coined the term ‘accidental’ complexity as a means of expressing all of the additional complexity that exists on top of the base for a problem. This is similar to Einstein's ‘as simple as possible, but no simpler’, which in itself is another way of decomposing the 2 underlying types of complexity.

Really though, any additional complexity is hardly an accident. People, to different degrees, are not always able to see what is simplest. Calling that an accident is nice, but I prefer just labeling it as ‘artificial’ complexity, as it sits on top of the ‘real’ complexity of the problem. It’s there, we should not be concerned with ‘why’ it is there.

Once we get through those definitional issues, it leads to a sense that we might be able to get down to a minimal complexity. Since there are always a large number of different decompositions, with different trade-offs, its best to stay away from an absolute ‘minimum’ concept. Given any problem, and enough time, work, and knowledge, we can work hard to reduce a lot of artificial complexity from a solution; the key here is that removing such complexity should not in any way change the proposed solution. It remains the same. If something is removed, and the solution changes, then it was intrinsically part of the base complexity and the new solution is a different one from the original.

We can tighten this down in programming by coming close to Kolmogorov complexity. That is, we can talk about 2 programs that behave identically, but are different in size. We have to also consider the property of ‘readability’, in that if one shrinks the code in an overly clever way, it may be smaller but at the cost of diminishing the readability. That might be fine for the next release, but it puts up an obstacle for future development, so we only want to shrink the code in ways that don’t lower the readability.

Basically, if one program has dead code in it, then getting rid of that is good. It obviously doesn’t change the behavior. If there is a readable way of rearranging the syntax, that is also good. If there is a way of rearranging the statement/function structure that preserves the behavior, but does a better job of organizing the code, either by defragmenting or flattening it, then that is also good.

Reducing all of the variables to 1 character names or acronyms would be bad, as would squashing the logic with syntactic tricks. Dropping error handling, or switching to a heuristic instead of using a slower algorithm both fall into the bad category as well.

Collapsing a large number of special cases into some generalized case, without significant adverse effects on performance would be good, in that there are now fewer pieces to the puzzle to worry about. That does come at the external cost of having to communicate that abstraction somehow, but that is often more of a staffing issue than a construction one. Encapsulating does increase the complexity, but it also isolates it, so the context becomes manageable.

As it is with code, the same principles hold true for the data as well. We can rearrange the model, in terms of types, structure, and constraints, to remove artificial complexity from its persistence and traveling between the components of the system. This type of normalization is well understood with respect to relational algebra, but it can be reapplied against any formal system, including code if size and other properties like readability are removed.

Since software is a map between the informality of the real world and the formal mechanics of a computer, the map itself is subject to considerable artificial complexity. Most often, in domain-specific code, this is where the worse problems originate. Some of these are rooted in uncertainties, some from lack of knowledge, and some are even rolled along in history. Often they aren’t intrinsically technical issues, but rather misunderstandings of the underlying properties of the domain, like whether or not aspects of it are dynamic.

Since they are so easily over-simplified, the best options are to minimize any constraints, unless there is definitive proof that they are rigid. That is, assume the widest possible circumstances, then move to restrict them gradually as enlightenment is enhanced. That falls afoul of impatience, but it tends towards a more efficient long-term approach if all things are really taken into consideration, like the time a solution can remain viable, without major modifications. Getting that relationship backward is a self-fulfilling prophecy in that excessive artificial complexity can cross a threshold that quickly diminishes any value in the work, leading to a ‘why bother’ conclusion. Line by line, we’ve seen enough code that is decades old to know that we frequently underestimate the possible lifespan of our work. Quality, battle-tested and leverageable code when it aligns well with the real world is sticky. More of that is obviously more efficient. It’s a write once, use forever philosophy.

Adding artificial complexity is really expensive in the long run, but it often seems like a cheaper answer in the short run, mostly because full consideration isn’t given. For people who have seen minimal complexity, it’s an obviously better choice, but since we so haphazardly throw artificial complexity everywhere, that experience isn’t the norm.