The
standard approach to getting a big project accomplished to identify the
problem, come to an overall solution, then keep breaking it down into
smaller, more manageable pieces. Once these are small enough -- specific
tasks -- all that’s left is to march through them one-by-one and
complete them. This is quite a reasonable way to approach problems where
the problem isn’t heavily hierarchical in nature and the sub-parts are
mutually independent or at least mostly independent.
If
however, the problem is multi-dimensional in its nature, and there are a
significant number of different ways to decompose it, this doesn’t
work. If there are at least two logically consistent ways to decompose a
problem then the underlying pieces are interdependent. These
cross-dependencies means that “linearizing” the problem and marching
through each piece in sequence will result in a significant amount of
redundant effort. Beside the extra work, redundancies reek havoc because
people are inherently inconsistent. These resulting differences in
output (and any consequential problems) amplify the complexity, and
again result in extra work.
An
example of this is a problem that can broken down into A, B and C. It
can be further decomposed into 1, 2, 3 and 4. So if we decompose
lexically first, we get the following pieces:
A1, A2, A3, A4, B1, B2, B3, B4, C1, C2, C3 and C4.
If we decompose numerically we get:
1A, 1B, 1C, 2A, 2B, 2C, 3A, 3B, 3C, 4A, 4B and 4C
Thus
we have a two level hierarchy which falls into 12 different sub-pieces
that we need to complete in order to get the project done, with two
reasonable ways to slice and dice it.
While
this type of problem exists generally with any type of labor, it is
very noticeable in software development. As such the concept of
‘polymorphism’ arose. The essence behind this concept in software is
that for a number of similar but different types of data, they all share
a common interface so that the code working on them doesn’t need a
specific implementation for each different type. Thus, this approach
addresses a decomposition on a finite number of static data-types. But
the concept itself is general. It is also not restricted to things that
are particularly static, nor data. Thus it can be applied to dynamic
data as well. Data that is mutating in unexpected ways can still share a
common interface (and support reflection). But it can also be applied
to static code, where all of the blocks of code share an identical
interface. And of course, given those other two, it can apply to dynamic
code as well.
While
it is a commonly used and a very effective programming paradigm, it can
also apply generally back to any basic problem decomposition. In our
above example, there are 12 different pieces of work that need to be
completed. But there are really only 7 different unique items in that
work list. Their permutation may be a combinatorial explosion, but that
doesn’t mean that the only way to approach getting the work done is to
explicitly work through all 12 pieces. We might conclude that at maximum
efficiency we need only complete 7 pieces of work, but I would suspect
that in practice there is some other overhead required in order to bind
together the sub-pieces. Still the expectation would easily be that a
polymorphic approach to handling the work would lay somewhere between 7
and 12. If it were 8 for instance, that would mean that the most
efficient way to solve the problem was 2/3rds of the amount of effort
required to just pound through all of the pieces. That would be a
considerable savings.
Now,
at this point some readers are probably on the edge of their keyboards,
but they likely fall into two different categories: a) those that
disagree with there being a more efficient way of solving the problem,
and b) those that see this whole discussion as obvious. I’ve certainly
met both kinds of people in practice, although I think more people fall
into the ‘a’ category. The reason I think this is more common is that it
is easy for most people, when faced with a problem to adopt tunnel
vision. That is, they don’t focus on the ABCs, nor on the 123s, but
rather they go pick what they believe to be the one correct
decomposition (lexical or numerical) then they narrow down their field
of vision to deal only with a sub-problem like A1. While working on A1,
they don’t want to hear anything about A2, and certainly don’t want to
know, or even consider C4. They just accept that there are twelve things
to do and start doing them.
Again,
this shows up very clearly in programming; quite frequently. It is
fairly common if you read a lot of code to see a programmer belt out
very specific handling for A1 in one location, and then find another,
completely separate, yet nearly similar A2 in another. Not only does
this occur for code, but it also pops up in the data and the static data
used for operational configuration. Software, these day, is highly
redundant. It actually gets worse when you look at teams of programmers.
The large to massive code bases seem to start with 2x - 3x redundancy
and I wouldn’t even want to guess what their worst factors are. The just
become ever increasing seas of redundant data and code.
Even
though polymorphism is a well-known, popular approach to architecting
code, It becomes very unlikely in most cases that a significant
proportion of most code-bases is utilizing it across the board. There
may be some sub-instances, but most often they are trivial uses. This
continues to occur, while oddly one of the most famous modern adages of
programming is the DRY principle -- don’t repeat yourself -- which came
out of The Pragmatic Programmer book (and possibly series). It’s rather
obvious that this wisdom isn’t getting followed on a regular basis quite
simply because its really really hard to accomplish in practice. If it
were being attempted more often, we’d see it dominating far more of the
technical conversations. We see it show up more often in the vast amount
of open code that exists. And a huge number of modern technologies not
only violate this principle, but they actively encourage it. So while it
is talked about, it is quite often swept under the rug in the rush to
belt out the next version.
I
do find this ironic because certainly in software, we’ve seen a huge
growth in the expectations for what software can do, and in general
given the rapidly increasing -- out of control -- complexity of our
societies, you’d think more and more people would start to look for
efficiencies. But it does seem that the opposite is true about human
nature. That is, the more complex things become, the more tunneled the
vision of the people who have to deal with them. Our species seems to
retreat towards the shallows the moment things get too much for them.
This, it seems, sets up a feedback loop, forcing people who took the
long road to be late, thus forcing them to not analyze the next fork
sufficiently and again choose a longer route. And then it never ends.
And
it’s that type of negative feedback loop that I’ve seen in many
software development projects, both at the coding level and the
organizational one. The projects get so ground down by slapping on
band-aids to unnecessary problems caused by redundancies, that they lack
the resources to avoid creating them in the first place. The work
degrades into a loosely coupled collection of disorganized functionality
that mitigates most of its own benefits. Not only have they chosen the
lest affective path, they’ve also opened up an ever widening sink hole
for their own resources. And it seems that this type of problems comes
directly from our own intuitive approach to solving problems. We are
most likely to not make the right choices (and then far too stubborn to
backtrack).
No comments:
Post a Comment
Thanks for the Feedback!