The Programmer's Paradox

Tuesday, December 10, 2019

Development Speed

Software is counter-intuitive. Sometimes to speed up development, you have to slow it down.

Let me explain.

It’s easy to get a small piece of software together. You belt out the basic code; you don’t need to worry about high falutin ideas like architecture, organization, etc. You just need code.

That works, only as long as the software stays small. But the technique itself is predicated on pounding out rather frail, hard-coded instructions that are very specific to the task at hand. That is fine, but it doesn’t scale, Not even a little bit.

Once the project has accumulated enough lines of code or other forms of complexity, making any changes to it involves finding the right compromises that are fragmented across all of the source. Once that source exceeds the ability for someone to easily remember it, and move around it seamlessly, then attempts to fix or improve it will slow it down. They don’t just gradually slow down, they basically walk off a cliff. Changes that might have required a few hours suddenly balloon to taking days or weeks. As well, the quality of the changes declines, it then feeds back into a cycle making all future changes harder as well. Parts of the code hit this vicious cycle at different times in the descent, but the chaos in one part of the system spreads.

If a project has started swirling downwards this way, the only real way out of this is to admit that the code has become a huge mess. That the work necessary to stop this decay is essentially re-organizing huge swaths of the code base with an eye on defragmenting it and laying down some rather strict organization. That kind of work is really slow. You can’t just leap into it, rather it takes time and effort to diligently, slowly, rearrange it at multiple levels, back up to a place where it can move forward again.

But just cleaning up the code isn’t the only issue. Over specific code and redundant code, both occur in projects that are falling apart. Both of these problems need addressing.

Redundant code is easier since you can just look around the codebase and find a lot of similar functions or config data. If they are close enough to each other, it is an easy change to merge them together and only use one copy. Again it is slow, but if done well, it has a huge lift on the quality and tends to make a lot of bugs just disappear, so its payoff is obvious.

Over-specific code is a little harder to tackle.

Most development projects go on for years, if not decades. During the life of the project, the focus generally starts at one specific point in the domain and spreads, much like construction, over the surrounding areas. When the project is started, people are often tunnel-visioned on that initial starting point, but taking a big project one tiny step at a time is painfully slow.

Instead of pounding out code for each specific case, when there are a lot of them in the same neighborhood, the most optimal method over the total lifespan of the development is to batch these cases together and deal with them in one large blow. This is an off-handed way of talking about generalization and abstraction, but it gets the point across that picking the right abstraction that repeats over and over again in the domain, will accelerate a project through that territory at light speed. The cost is often just marginally more effort in the beginning, but the payoff is massive.

Often the counter-argument to the above is that it is impossible to guess at which problems will repeat significantly enough in the domain to make a reasonable guess at what to abstract. However, that’s never been a valid point, in that software starts at a particular spot, and the functionality spreads from there. It is all connected. One doesn’t start writing an accounting system and end up with a video game, it doesn’t work that way. So the possible paths for the code are connected and intertwined, and for the most part, obvious. Given that, some prior experience is enough to lay out the pathways for a very long time, with reasonable accuracy, subject to radical shifts like a startup that pivots. So basically if you get in some experienced people, along with their effort, you get their foresight, which is more than enough to be able to maximize the efficiencies over a long period of time.

On top of all of these other efforts, hard lines need to be drawn through the code to separate it into smaller pieces. Without that separation, code gets fragmented quickly and the problems come right back again.

The test that the separation is good enough and clean enough, is that it can be easily documented with fairly simple diagrams. If the sum of the details is so messy that it is a Herculean job to create an accurate diagram of the parts, then it is disorganized. The two go lockstep with each other. If you can create a simple diagram that only kinda reflects the code, then quite obviously you want to refactor the code to match that diagram.

These lines then form an architecture, which needs to be preserved and enhanced for any and all future extensions. Although it should be noted that as the general size of the code base grows, it is quite common for it to outgrow its current architecture and then need to be entirely reorganized by a different set of parameters. That is, no specific organization of code is scalable. It is all relative to its size and complexity. As it grows, then any organization needs to change with it.

Given the above issues, it is inevitable that during the life of a coded base there are times when it can run quickly without any consequences and times when everyone has to slow down, rearrange the project, and then start working through the consequences.

Avoiding that will bring on disaster faster than just accepting it and rescheduling any plans that are impacted. If a project decays enough, it reaches a point where it can not move forward at all, and is either permanently stuffed in maintenance mode or should be shelved and started over from scratch. The only problem with restarting is that if the factors that forced it into that cycle in the first place are not corrected, then the next iteration of the code will suffer the exact same problem. Basically, if the developers don’t learn from history, then they will absolutely repeat it and get back to the same result.

Wednesday, August 21, 2019

Bugs as a Reflection of Coding Issues

A long, long time ago I read a book about the most popular programming errors in C. Off-by-one was in the top ten.

I have made thousands of off-by-one bugs in my career over the decades. It is easily my biggest mistake, and you’d think knowing that I would be able to reduce the frequency of them.

I suspect that the problem comes from the way I see code. At the point that I am focussed on the higher-level mechanics and flow, I easily ignore the low-level details. So, if I am coding a complex algorithm that is manipulating lots of arrays, whether or not they are symmetric or asymmetric bounds is not particularly interesting. Depending on the tech stack, it's not uncommon to see too much switching between the two, and when I am clearly not paying attention, I run a 50/50 chance of getting it wrong. Which I do, a lot.

Now, in knowing this, I am not able to change my coding mindset, but that doesn’t mean I can’t correct for the problem. So I do. When I am testing to see that the code I’ve written matches my understanding of its behavior, one of the key places I pay attention to is the index bounds. So, if there is an array, for example, I need to add in at least a few examples, then remove a few. That is one of the key minimal tests before I can trust the work.

As a consequence, even though I make a large number of off-by-one bugs, it is actually very rare for them to get into production. If they do, I generally take that as a warning sign that the development process has at least one significant problem that needs to be addressed right away.

Generalizing, because we model the solutions in our head, then code them into the machines, that code can be no stronger than our internal models. Well, almost. For a large set of code that has been worked on by many different programmers, the strength is an aggregate of the individual contributions and how well they overlap with each other.

What that means is that you could segment each and every bug in the system by its collective authors and use that to refine the development process.

So, for example, the system is plagued by poor error handling. That’s an attribute of not enough time or the developers not considering the real operational environment for the code. Time is easy to fix, but teaching programmers to look beyond the specified functionality of the code and consider all of the other possible failures that can occur is tricky. Either way though, the bugs are providing explicit feedback into issues within the development process itself.

It’s actually very similar for most bugs. As well as being the problems, they shed light on the overall analysis, design, coding, and testing of the system. A big bug list for a medium-sized project is an itemized list of how the different stages of development are not functioning correctly. If, for example, the system is doing a poor job with handling the workflow of a given business problem, it’s often due to incomplete or disorganized analysis. If the system can’t keep up with its usage, that comes from technical design. And of course, if the interfaces are awkward and frustrating the user than the UX design is at fault. And of course, stupidly embarrassing bugs getting out to product are instances of testing failure.

The way we build software has a profound effect on the software we produce. If we want better software, then we need to change the way we build it, and we need to do this from an observational perspective, not just speculative opinions (since they likely have their own limited-context problems).

Wednesday, July 31, 2019

Breaking it Down

It is generally understood that the best way to solve a large problem is by breaking it down, decomposing it into bite-sized pieces.

While that overall approach is easy to say, it is actually quite difficult to list out the specific steps for how to decompose problems into their ‘atomic’ components. As well, it is often forgotten that once all of these pieces have been decided upon, they still need to be assessed together in order to ensure that they still cover the original problem.

For that first issue, large problems are intrinsically complex, that’s their definition. So it's worth noting that the full weight of all of their details exceeds the ability of any single person to internally understand or visualize them. That’s why they are considered large.

To get beyond their size, we essentially rely on layering. We take the original problem and break it down one level at a time into a new set of smaller problems. Each of these is a layer in the solution.

Obviously adding layers increases complexity for the overall problem, so it is vital that each new layer only contains pieces that are independent from each other. That is, the complexity needs to be split cleanly, any extra complexity from adding the layer should be less than the complexity of the underlying pieces.

If we were to think of this more formally, the sum of the new pieces should be less than the altered whole. That seems easy and obvious, but it entirely relies on the pieces being independent of each other. If they aren't, then the worst case is that for dependent pieces, they inherit each other’s complexity, their individual complexities are the combined. To be specific, take a problem P, and break it into 3 pieces, c1, c2, and c3. If c1 and c2 are intertwined then in the worst case we get c(P) + L <= (c1 + c2) + (c2 + c1) + c3 where L is the cost of adding a new layer. By decomposing the problem into a ‘blurry’ layer, we’ve essentially increased the artificial complexity beyond any benefits of adding that layer.

That is the quantitative cost, but there is a human cost as well. The combination of the first and second parts have only been reduced by around 2/3rds of the whole, not the full 1/3rd that we could have had to bring this part of the problem down into a manageable size. This builds up. So, if it should have taken 4 layers to contain the solution, we might need to double that to 8 because of the overlaps.

This points back again to the importance of ensuring that any breakdown is only useful if the pieces themselves are independent.

The secondary problem, particular with scaling the solution, is to have gaps between the pieces. If the pieces fit poorly, then slightly different reassemblies will create new problems. The solution isn’t scalable. Most solutions only have sufficient value if they are applied more than once, but gaps can interfere with being able to do that.

Both issues: overlaps and gaps, strongly diminish the decomposition. If we add in enough blurry layers, we have seen in practice that we can bloat the complexity exponentially or worse. In that sense, while layers are necessary to solve the problem, they are also risky.

So, the underlying issue is that given a big problem how do you subdivide it cleanly into independent, atomic, pieces? The unfortunate answer is that you can only really do this if you fully understand all of the details. But often that is not possible, and as previously mentioned the problem is already too large for any individual to handle.

The approx answer is that we need to get close enough to a reasonable solution, then slowly converge on the best answer.

To do this, decomposition requires many forms of definition and categorization. We can start these loosely, and continually revise them. For this, we want as many instances of the problem as we can get our hands-on. For each of them, we can precisely define them with all of their known details, then we can parse that information into ‘syntactic’ components, essentially verbs and nouns. From here we can split any clashes in say nouns, subdividing them until all of the obvious overlaps are gone. Then for each instance, this gives us a breakdown of what are associated attributes, basically the smallest verbs and nouns. With this set, for each instance, we can partition all of the instances from each other. In doing this, we have to count what are really the dimensions of variability (since the axis provide a natural decomposition).

It’s worth noting that any variable dimensionality must match. If you construct a 1D categorization over 2 dimensions, it increases the likelihood that one of the dimensions will span multiple categories, which becomes a dependency, so it bumps up the complexity. However, if you have 2 distinct categorizations for each of the dimensions, then you can work with the permutations at a higher layer to combine them at the cost of adding in that extra layer. In that way, as we are making sense of all of the special cases and categorizing them, we are also building up these layers. The layering itself though can be seen as a more general instance of another problem that needs its own solution, so it is upwardly recursive.

A somewhat related programming example is that you want to define an architecture (overall organization) for a fairly large system. You might split all of the code by its usage, say between an interactive interface and a batch reporting facility. But along with usage, there might be shared commonalities for specific data types like strings. The usage of the code is a different dimension from the shared data-structure manipulation code. We’d like the manipulations to only be specified once (so they are independent) but they are equally likely to be necessary on either side of the usage breakdown. We need them for the interface and we need them in batch. Without giving it much consideration, programmers often bind the usage dimension to the higher architectural arrangements but keep a hybrid category called a shared library available to span any architectural divide. Occasionally, you see this done cleanly, but most often the failure to admit these are different dimensions leads to a hodgepodge of redundant and shared code. So, because of that, it is an easy issue to spot in a large codebase, but an extraordinarily hard one to solve with a trivial solution.

Given all of the above, to really get going with a large decomposition means collecting a large number of examples, breaking them down into attributes, identifying the dimensions, then drawing all of the vertical and horizontal lines between them. At that point, one can cross-check that artificial examples do not fit into multiple boxes and that any permutation has only one unique location. For completeness, the boxes need names that truly reflect their content. At this point, the first step is done.

As mentioned at the start, after decomposition, all of the boxes need to be specified fully, then recomposed to see that they still work together. If everything were truly independent, and it was obvious, then this next step could be skipped, but again this problem is large enough so that neither of those preconditions exists.

Given a small enough box, a capable person can now produce a solution, but the overall context may have been lost. This box may contain a significant amount of variability and it is this intrinsic freedom that is dangerous. Thus, each box still contains a large number of possibilities, but not all of these solutions will interact correctly with the whole.

Another issue is that the necessary size of the boxes is dependent on individuals. Some people can cope with larger boxes, some cannot, so the boxes may still need further decomposition.

At some point with recomposing, it is likely that some missed dependency will creep in. One little box in one part of the solution will be found to be unexpectedly tied to another box somewhere else. Given the size, scope and general usage of the solution there are multiple ways of handling this. The best, but most time-intensive, is to roll the dependency upwards until it reaches a layer were the cross-dependency exists, then just recategorize that layer and all of the affected children.

Sometimes, due to operational or time issues, that is not possible, so the alternative is documentation. The dependency is noted in the attached documentation but the instances are handled redundantly. That type of documentation needs to be bound to the solution, and to stay that way for its entire usage. The worst thing to do is to ignore or dismiss the problem, as it is most likely to set other problems into motion.

A major concern with the above is the fear that rigorously following it will lead to too many layers. Some layers exist mainly for the purpose of bringing down the complexity, others are tightly bound to discovered dimensions. Obviously invalid layers that do neither are just increased complexity without benefit, but for the necessary layers, the underlying degree of sophistication of the solution is for the most part dependant on their existence. If you remove them, the solution is unknowable or it is oversimplified. In the first case, an unknowable amount of complexity will not be predictable and so is not trustworthy. Eventually, it won’t be the solution, but rather take its place as part of the problem itself, so it is a rather negative contribution. Being over-simplified is similar. It won’t really solve the full problem and will spin off lots of sub-problems as part of its usage. Generally, things get worse, but not necessarily linearly.

Relying on a faulty solution may take a long time to trigger the full weight of the mistake. It might seem like it worked, but that’s misleading. Because of that, for a given problem there is a bound on the necessary number of layers required for a tight-fitting solution. Comprehension and dimension handling open the door for some wiggle room, but it is safe to say that there are some problems that need a nearly fixed number of layers to solve properly. So, if the sophistication of the problem requires around 20 layers, but the proposed solution only has 5, we can pretty well infer that that specific solution will not really handle that set of problems. That at some point, it will go horribly wrong. If the proposed solution has 30 layers, again we can often infer that it will take longer than necessary to implement it and that it could be difficult to extend when needed. There are always a lot of possible solutions, but very few will fit properly.

With all of the above in mind, identifying a problem then decomposing it into pieces to build a solution has a lot of small subtleties that for problems that are highly intertwined make it tough to get real workable solutions. From a dependency standpoint, problems that are trivially independent are trivially solvable, but all it takes to throw that off is non-obvious dimensions that weed their way through the decomposition. In a real sense that is why most problems look a lot easier on the outside then they do on the inside when you’ve acquired deep knowledge about them. In that depth, the details lead to interconnections, which bind the parts at a distance. Not seeing those is the beginning of the end.

Tuesday, March 19, 2019

Cooperation, Competition and Control

Life is a dynamic process. All forms of life compete for the ability to propagate.

Our species bands together; this cooperation gives us a competitive advantage. Within our societies, we compete with each other for control of any of the resources. We are driven to do this.

Competition and cooperation intertwine at all levels within our interactions.

When a competition becomes stagnant, incentives grow to subvert the underlying cooperation that enables it. If some of the competitors bend the rules, to remain in the competition, the rest have to as well. Each iteration of the game converges to being stale, so the need to subvert the cooperation increases with time and the individual stability of the players.

Outside enforcement or a steady turn over of the players tends towards a fairer competition. Most new competitions start reasonably fair, but without correction will not remain that way.

Control, in an uncontrollable world, is the prize. It is best utilized when achieved, since it may be increasing lost with time. Through control, we can offset or at least delay other competitions.

A stronger base cooperation enables more intense competition. The two extremes cycle in dominance; one always pushes back on the other.

The game sometimes plays out across generations, but it isn’t always obvious to the players.

As some players compete and push their way up through the ranks, they become willing to do anything to move into the best position.

Some people just don’t want to play, they favor more cooperative environments.

A desire to win seems to be the stronger deciding factor, but can sometimes backfire, depending on the game.

We build a lot of myths around competing fairly, but most competitions are well past that stage. Most people outside the game are unaware of the status and most of the players would rather not admit it.

A stable competition must constrain the game. Stability comes from the outside, it must be tied to our most basic need to cooperate. The outsiders must maintain their ability to enforce the rules of the game. There is no naturally inherent stability, time will always pass, the game will always get stale.

The rules of the game need verification. Bends or breaks must be detectable. Any enforcement must know when to act.

Limited play time helps, but that can be subverted by cooperating groups which extend the context. If one group’s horizon significantly exceeds the field, then all other players will bind to different groups and the effect is the same as individuals, but just plays out a little slower.

Cooperation drives us at a basic level, but the need to compete and to gain control are dominate in our societies. We will compete at any and everything. If we want a better world, we need to address this at the core; to accept it and to allow it, but also to contain it to remain positive. Otherwise, the same decaying cycles just play out over and over again.

Thursday, February 21, 2019

Software Optimizations

Most software can execute faster.

There are many ways that software can be optimized to improve its performance. Most of these techniques are well-understood, but they still need to be used with caution, in that they can accidentally harm other attributes of the system.

The most obvious way to speed up code is to no longer do useless work.

One common form of wasted effort is to redundantly copy the same data to many different areas of memory. Another is to parse the data into smaller pieces and then reassemble it later or vice versa. Removing these from code should get it closer to the minimal effort, but not necessarily the minimum.

Sometimes, however, extra copies of data are necessary for security, locking or architectural reasons. These types of redundancies can stay, if they are justified. For sanity reasons, most incoming data for a large system should be fully parsed as soon as possible. Exporting this data may legitimately require reassembling it. This too is fine.

Switching to a better algorithm, quite often, can afford very large time savings. There is a huge amount of knowledge available about algorithms and their performance attributes. Significant research is always required.

Sometimes shifting between time and space works really well. We can rebalance the code to shift this resource usage. In some cases though, there are natural boundaries for reductions, so the embedded information doesn’t exist to optimize the code.

Algorithmic optimizations are often the most effective, but they require a great deal of knowledge and often a lot of time to implement properly.

Beyond that, memoization which is the reuse of earlier computations can produce decent optimizations. Caching is the most famous of these, but care must be taken to distinguish between read-only and write-through caching. They are very different from each other and frequently confused. A bad implementation can cause weird bugs.

The big trick with memoization is not in saving the value, but rather in knowing ‘precisely’ when that value is no longer of any use. These types of optimizations require a strong understanding of usage and data frequency. Generalized solutions can help (and hurt), but specific solutions will always produce better results.

An example of this is compression. Data can be taken down close to its information theoretic minimum, beyond that some data is lost (which can also be acceptable). The act of reducing the size of the data is accomplished by utilizing these redundancies. This is also a classic time vs space tradeoff.

Parallelizing computation is another strong form of optimization. To make it work on interconnected data usually requires synchronization primitives like locking. Locking can be coarse-grained or fine-grain, with the latter usually providing better performance at the cost of more management overhead. Locking gets extraordinarily challenging when it occurs outside of a single process.

Locking algorithms spread across different computers are bounded by TGP (two generals problem) which in itself influences impossibility results like CAP and FLP. Generally, this is caused by an inherent ambiguity (missing information) within the communication between the separated computations (getting worse as the underlying reliability of the communication weakens). This sometimes described as transactional integrity as well.

In general, we can optimize code by seeking out data independence. If for a given computation, there is some dependence for the result on some other piece of data, then that relationship bounds the minimum amount of work. All outputs must be produced from some finite set of inputs. This is true for all computations. As there is a precise minimum for information, there also exists one for computation.

Optimization attempts then can start by observing for a given context that there will never be ties between any specific set of variables and using that information to reorder the work involved to get closer to the minimum. That is, we can conjecture that for any specific output, there are a finite number of both computations and data that form a minimum directed acyclic graph (DAG) with all inputs as leaves. Then there should exist a minimal such DAG (relative to the computational primitives). This can be applied mechanically to any set of instructions, for a given set of data, as it is bounded by a specific context. Fill in these unknowns and the minimal set of work is explicit.

Some algorithmic optimizations are tricky in that they would require currently unknown relationships to exist in order to find the actual minimum effort. We can, however, come close to the minimum, even if we can’t get there yet.

Most other optimizations are easier, in that they really come from understanding the data, its usage and the underlying functioning of the computers themselves (sometimes optimizations at one level exist to counterbalance bad optimizations at a lower level).

Most code is written as the ‘obvious first try’, so most of the time there is plenty to optimize. However, most programmers do not fully understand the data or the context, which is why we warn younger coders to not prematurely attempt to optimize. They do not have a full enough understanding yet to do it correctly and bad optimizations, by definition, will use more resources not less. Poor optimizations can impair readability or extendability. Only good optimizations will help.