The Programmer's Paradox: July 2019

It is generally understood that the best way to solve a large problem is by breaking it down, decomposing it into bite-sized pieces.

While that overall approach is easy to say, it is actually quite difficult to list out the specific steps for how to decompose problems into their ‘atomic’ components. As well, it is often forgotten that once all of these pieces have been decided upon, they still need to be assessed together in order to ensure that they still cover the original problem.

For that first issue, large problems are intrinsically complex, that’s their definition. So it's worth noting that the full weight of all of their details exceeds the ability of any single person to internally understand or visualize them. That’s why they are considered large.

To get beyond their size, we essentially rely on layering. We take the original problem and break it down one level at a time into a new set of smaller problems. Each of these is a layer in the solution.

Obviously adding layers increases complexity for the overall problem, so it is vital that each new layer only contains pieces that are independent from each other. That is, the complexity needs to be split cleanly, any extra complexity from adding the layer should be less than the complexity of the underlying pieces.

If we were to think of this more formally, the sum of the new pieces should be less than the altered whole. That seems easy and obvious, but it entirely relies on the pieces being independent of each other. If they aren't, then the worst case is that for dependent pieces, they inherit each other’s complexity, their individual complexities are the combined. To be specific, take a problem P, and break it into 3 pieces, c1, c2, and c3. If c1 and c2 are intertwined then in the worst case we get c(P) + L <= (c1 + c2) + (c2 + c1) + c3 where L is the cost of adding a new layer. By decomposing the problem into a ‘blurry’ layer, we’ve essentially increased the artificial complexity beyond any benefits of adding that layer.

That is the quantitative cost, but there is a human cost as well. The combination of the first and second parts have only been reduced by around 2/3rds of the whole, not the full 1/3rd that we could have had to bring this part of the problem down into a manageable size. This builds up. So, if it should have taken 4 layers to contain the solution, we might need to double that to 8 because of the overlaps.

This points back again to the importance of ensuring that any breakdown is only useful if the pieces themselves are independent.

The secondary problem, particular with scaling the solution, is to have gaps between the pieces. If the pieces fit poorly, then slightly different reassemblies will create new problems. The solution isn’t scalable. Most solutions only have sufficient value if they are applied more than once, but gaps can interfere with being able to do that.

Both issues: overlaps and gaps, strongly diminish the decomposition. If we add in enough blurry layers, we have seen in practice that we can bloat the complexity exponentially or worse. In that sense, while layers are necessary to solve the problem, they are also risky.

So, the underlying issue is that given a big problem how do you subdivide it cleanly into independent, atomic, pieces? The unfortunate answer is that you can only really do this if you fully understand all of the details. But often that is not possible, and as previously mentioned the problem is already too large for any individual to handle.

The approx answer is that we need to get close enough to a reasonable solution, then slowly converge on the best answer.

To do this, decomposition requires many forms of definition and categorization. We can start these loosely, and continually revise them. For this, we want as many instances of the problem as we can get our hands-on. For each of them, we can precisely define them with all of their known details, then we can parse that information into ‘syntactic’ components, essentially verbs and nouns. From here we can split any clashes in say nouns, subdividing them until all of the obvious overlaps are gone. Then for each instance, this gives us a breakdown of what are associated attributes, basically the smallest verbs and nouns. With this set, for each instance, we can partition all of the instances from each other. In doing this, we have to count what are really the dimensions of variability (since the axis provide a natural decomposition).

It’s worth noting that any variable dimensionality must match. If you construct a 1D categorization over 2 dimensions, it increases the likelihood that one of the dimensions will span multiple categories, which becomes a dependency, so it bumps up the complexity. However, if you have 2 distinct categorizations for each of the dimensions, then you can work with the permutations at a higher layer to combine them at the cost of adding in that extra layer. In that way, as we are making sense of all of the special cases and categorizing them, we are also building up these layers. The layering itself though can be seen as a more general instance of another problem that needs its own solution, so it is upwardly recursive.

A somewhat related programming example is that you want to define an architecture (overall organization) for a fairly large system. You might split all of the code by its usage, say between an interactive interface and a batch reporting facility. But along with usage, there might be shared commonalities for specific data types like strings. The usage of the code is a different dimension from the shared data-structure manipulation code. We’d like the manipulations to only be specified once (so they are independent) but they are equally likely to be necessary on either side of the usage breakdown. We need them for the interface and we need them in batch. Without giving it much consideration, programmers often bind the usage dimension to the higher architectural arrangements but keep a hybrid category called a shared library available to span any architectural divide. Occasionally, you see this done cleanly, but most often the failure to admit these are different dimensions leads to a hodgepodge of redundant and shared code. So, because of that, it is an easy issue to spot in a large codebase, but an extraordinarily hard one to solve with a trivial solution.

Given all of the above, to really get going with a large decomposition means collecting a large number of examples, breaking them down into attributes, identifying the dimensions, then drawing all of the vertical and horizontal lines between them. At that point, one can cross-check that artificial examples do not fit into multiple boxes and that any permutation has only one unique location. For completeness, the boxes need names that truly reflect their content. At this point, the first step is done.

As mentioned at the start, after decomposition, all of the boxes need to be specified fully, then recomposed to see that they still work together. If everything were truly independent, and it was obvious, then this next step could be skipped, but again this problem is large enough so that neither of those preconditions exists.

Given a small enough box, a capable person can now produce a solution, but the overall context may have been lost. This box may contain a significant amount of variability and it is this intrinsic freedom that is dangerous. Thus, each box still contains a large number of possibilities, but not all of these solutions will interact correctly with the whole.

Another issue is that the necessary size of the boxes is dependent on individuals. Some people can cope with larger boxes, some cannot, so the boxes may still need further decomposition.

At some point with recomposing, it is likely that some missed dependency will creep in. One little box in one part of the solution will be found to be unexpectedly tied to another box somewhere else. Given the size, scope and general usage of the solution there are multiple ways of handling this. The best, but most time-intensive, is to roll the dependency upwards until it reaches a layer were the cross-dependency exists, then just recategorize that layer and all of the affected children.

Sometimes, due to operational or time issues, that is not possible, so the alternative is documentation. The dependency is noted in the attached documentation but the instances are handled redundantly. That type of documentation needs to be bound to the solution, and to stay that way for its entire usage. The worst thing to do is to ignore or dismiss the problem, as it is most likely to set other problems into motion.

A major concern with the above is the fear that rigorously following it will lead to too many layers. Some layers exist mainly for the purpose of bringing down the complexity, others are tightly bound to discovered dimensions. Obviously invalid layers that do neither are just increased complexity without benefit, but for the necessary layers, the underlying degree of sophistication of the solution is for the most part dependant on their existence. If you remove them, the solution is unknowable or it is oversimplified. In the first case, an unknowable amount of complexity will not be predictable and so is not trustworthy. Eventually, it won’t be the solution, but rather take its place as part of the problem itself, so it is a rather negative contribution. Being over-simplified is similar. It won’t really solve the full problem and will spin off lots of sub-problems as part of its usage. Generally, things get worse, but not necessarily linearly.

Relying on a faulty solution may take a long time to trigger the full weight of the mistake. It might seem like it worked, but that’s misleading. Because of that, for a given problem there is a bound on the necessary number of layers required for a tight-fitting solution. Comprehension and dimension handling open the door for some wiggle room, but it is safe to say that there are some problems that need a nearly fixed number of layers to solve properly. So, if the sophistication of the problem requires around 20 layers, but the proposed solution only has 5, we can pretty well infer that that specific solution will not really handle that set of problems. That at some point, it will go horribly wrong. If the proposed solution has 30 layers, again we can often infer that it will take longer than necessary to implement it and that it could be difficult to extend when needed. There are always a lot of possible solutions, but very few will fit properly.

With all of the above in mind, identifying a problem then decomposing it into pieces to build a solution has a lot of small subtleties that for problems that are highly intertwined make it tough to get real workable solutions. From a dependency standpoint, problems that are trivially independent are trivially solvable, but all it takes to throw that off is non-obvious dimensions that weed their way through the decomposition. In a real sense that is why most problems look a lot easier on the outside then they do on the inside when you’ve acquired deep knowledge about them. In that depth, the details lead to interconnections, which bind the parts at a distance. Not seeing those is the beginning of the end.

The Programmer's Paradox

Wednesday, July 31, 2019

Breaking it Down