Friday, February 24, 2012

Complexity and Decomposition

The standard approach to getting a big project accomplished to identify the problem, come to an overall solution, then keep breaking it down into smaller, more manageable pieces. Once these are small enough -- specific tasks -- all that’s left is to march through them one-by-one and complete them. This is quite a reasonable way to approach problems where the problem isn’t heavily hierarchical in nature and the sub-parts are mutually independent or at least mostly independent.  

If however, the problem is multi-dimensional in its nature, and there are a significant number of different ways to decompose it, this doesn’t work. If there are at least two logically consistent ways to decompose a problem then the underlying pieces are interdependent. These cross-dependencies means that “linearizing” the problem and marching through each piece in sequence will result in a significant amount of redundant effort. Beside the extra work, redundancies reek havoc because people are inherently inconsistent. These resulting differences in output (and any consequential problems) amplify the complexity, and again result in extra work.

An example of this is a problem that can broken down into A, B and C. It can be further decomposed into 1, 2, 3 and 4. So if we decompose lexically first, we get the following pieces:

A1, A2, A3, A4, B1, B2, B3, B4, C1, C2, C3 and C4.

If we decompose numerically we get:

1A, 1B, 1C, 2A, 2B, 2C, 3A, 3B, 3C, 4A, 4B and 4C

Thus we have a two level hierarchy which falls into 12 different sub-pieces that we need to complete in order to get the project done, with two reasonable ways to slice and dice it.

While this type of problem exists generally with any type of labor, it is very noticeable in software development. As such the concept of ‘polymorphism’ arose. The essence behind this concept in software is that for a number of similar but different types of data, they all share a common interface so that the code working on them doesn’t need a specific implementation for each different type. Thus, this approach addresses a decomposition on a finite number of static data-types. But the concept itself is general. It is also not restricted to things that are particularly static, nor data. Thus it can be applied to dynamic data as well. Data that is mutating in unexpected ways can still share a common interface (and support reflection). But it can also be applied to static code, where all of the blocks of code share an identical interface. And of course, given those other two, it can apply to dynamic code as well.

While it is a commonly used and a very effective programming paradigm, it can also apply generally back to any basic problem decomposition. In our above example, there are 12 different pieces of work that need to be completed. But there are really only 7 different unique items in that work list. Their permutation may be a combinatorial explosion, but that doesn’t mean that the only way to approach getting the work done is to explicitly work through all 12 pieces. We might conclude that at maximum efficiency we need only complete 7 pieces of work, but I would suspect that in practice there is some other overhead required in order to bind together the sub-pieces. Still the expectation would easily be that a polymorphic approach to handling the work would lay somewhere between 7 and 12. If it were 8 for instance, that would mean that the most efficient way to solve the problem was 2/3rds of the amount of effort required to just pound through all of the pieces. That would be a considerable savings.

Now, at this point some readers are probably on the edge of their keyboards, but they likely fall into two different categories: a) those that disagree with there being a more efficient way of solving the problem, and b) those that see this whole discussion as obvious. I’ve certainly met both kinds of people in practice, although I think more people fall into the ‘a’ category. The reason I think this is more common is that it is easy for most people, when faced with a problem to adopt tunnel vision. That is, they don’t focus on the ABCs, nor on the 123s, but rather they go pick what they believe to be the one correct decomposition (lexical or numerical) then they narrow down their field of vision to deal only with a sub-problem like A1. While working on A1, they don’t want to hear anything about A2, and certainly don’t want to know, or even consider C4. They just accept that there are twelve things to do and start doing them.

Again, this shows up very clearly in programming; quite frequently. It is fairly common if you read a lot of code to see a programmer belt out very specific handling for A1 in one location, and then find another, completely separate, yet nearly similar A2 in another. Not only does this occur for code, but it also pops up in the data and the static data used for operational configuration. Software, these day, is highly redundant. It actually gets worse when you look at teams of programmers. The large to massive code bases seem to start with 2x - 3x redundancy and I wouldn’t even want to guess what their worst factors are. The just become ever increasing seas of redundant data and code.

Even though polymorphism is a well-known, popular approach to architecting code, It becomes very unlikely in most cases that a significant proportion of most code-bases is utilizing it across the board. There may be some sub-instances, but most often they are trivial uses. This continues to occur, while oddly one of the most famous modern adages of programming is the DRY principle -- don’t repeat yourself -- which came out of The Pragmatic Programmer book (and possibly series). It’s rather obvious that this wisdom isn’t getting followed on a regular basis quite simply because its really really hard to accomplish in practice. If it were being attempted more often, we’d see it dominating far more of the technical conversations. We see it show up more often in the vast amount of open code that exists. And a huge number of modern technologies not only violate this principle, but they actively encourage it. So while it is talked about, it is quite often swept under the rug in the rush to belt out the next version.

I do find this ironic because certainly in software, we’ve seen a huge growth in the expectations for what software can do, and in general given the rapidly increasing -- out of control -- complexity of our societies, you’d think more and more people would start to look for efficiencies. But it does seem that the opposite is true about human nature. That is, the more complex things become, the more tunneled the vision of the people who have to deal with them. Our species seems to retreat towards the shallows the moment things get too much for them. This, it seems, sets up a feedback loop, forcing people who took the long road to be late, thus forcing them to not analyze the next fork sufficiently and again choose a longer route. And then it never ends.

And it’s that type of negative feedback loop that I’ve seen in many software development projects, both at the coding level and the organizational one. The projects get so ground down by slapping on band-aids to unnecessary problems caused by redundancies, that they lack the resources to avoid creating them in the first place. The work degrades into a loosely coupled collection of disorganized functionality that mitigates most of its own benefits. Not only have they chosen the lest affective path, they’ve also opened up an ever widening sink hole for their own resources. And it seems that this type of problems comes directly from our own intuitive approach to solving problems. We are most likely to not make the right choices (and then far too stubborn to backtrack).