The Programmer's Paradox: Organization

Disorganization is extremely dangerous for a software development project. No matter what code quality is delivered by the programmers, if it is just dumped into a ball of mud it will grow increasingly unstable. Software rusts quite quickly and cannot be maintained if the costs to read and modify it are unreasonable. Disorganization significantly amplifies those costs.

Organization applies both to the output, such as the data and code, and to the processes used for all five stages of software development. If the right work is not being done at the right time, the resulting inefficiencies sap the time and morale of the project, causing the work to falter. Once things have become messy, progress gets painfully slow. So it not only applies to the end, but to the means as well.

What is organization? It's a fairly simple term to claim, but a rather tricky one to define precisely. My favorite definition is the 17th (or 18th) century proverb "a place for everything and everything in its place" but that actually turns out to be a rather incomplete definition. It works as a start, but needs a little bit of refining:

A place for everything.
Everything in its place.
A similarity to everything in the same place.

While this definition may seem to only revolve around the physical world, it does actually work for the digital one as well.

'Everything' for a computer is code and data. There is actually nothing else, just those two although there are plenty of variations on both.

A 'place' is where you need to go to find some code or data. It's an absolute location e.g. the source code is located on machine X in the folder Y. The binary is located on machine P in the folder Q. These are both rather 'physical' places, even if they are in the digital realm.

With the first two rules it is clear that to be organized you need to define a rather large number of places and categorize all data and code into 'everything'. Then you need to make sure that all things are stored exactly where they belong. In principle it isn't that hard; in practice any medium to large computer system consists of millions of little moving parts. To get all of those into the proper place just seems like a huge amount of work. Really it's not. Particularly if you have some overwhelming 'philosophy' for the organization. Higher-level rules, i.e. best practices, fit generalizations over the details and when done appropriately can provide strong organizational guidance towards keeping a project clean. And then it is actually far less effort than being rampantly disorganized.

The third rule is necessary to insure that there aren't too many things all dumped into one big place. Some unscrupulous coders could claim that a ball of mud is organized because all of the code is in one giant directory in the repo. That, of course, is nonsense. Hundreds or thousands of source files in one giant mess is the antithesis of organization. The third rule fixes that. Everything in one place must be 'similar' or it just isn't organized.

So what does 'similar' mean? In my last post on Abstraction I talked about the rather massive permutations available for collections of sets. This directly implies that there are huge variations on degrees of possible similarity. As well, different people have very different tolerances for the proximity of any two things. Some people can find similarities at a great distance, while others only see them if they are close together. Taken together this implies that 'similarity' is subjective.

Two completely different people may disagree on what is close enough to be similar, and that of course propagates up towards organization. Different people will have very opposing views as to what is really organized, and what is disorganized. That being said, there is always some outer boundary on which the majority of reasonable people will agree that two things are definitely not similar, and it is this boundary that is the minimal necessity for the third rule. That is, there is basic organization and then there are many more advanced, more eclectic versions. If a project or process doesn't even meet the basics, then it is clearly doomed. If it twists a little to be tightly organized relative to a specific person's organizational preferences, then it should at least be okay, and it is always possible to re-organize it at some point in the future if the staff changes.

The corollary to this definition does imply however that if there are four programmers working on the same project with four distinctly different organization schemes, then if they overlap in any way, the project is in fact disorganized. If the programmers all choose different naming convention for the same underlying data, a thing, it is essentially stored in four different places, not one, violating the first rule. If duplicate code appears in many places, then the second rule is violated. If some component is the dumping ground for all sorts of unrelated functionality, then the third rule is broken. New programmers ignoring the conventions and tossing a "bag on the side" of an existing project are actually making it disorganized.

A large and growing project will always have some disorganization. But what is important is that there is continual work to explicitly address this. The data and code continually grow, drifting in similarity, so once the disorganization starts to impacts the work it needs to be addresses before it consumes a significant amount of time. And it needs to be handled consistently with the exact same organization scheme that already applies to everything else. A project that sees this as the minimum mandatory work involved in building up a system is one with a fighting chance for success. A project where this isn't happening is in trouble.

Testing to see if a system is organized is fairly simple. You just need to go through the data and code and see if for every place, the things are all similar. If there is lots of stuff out of place, then it is disorganized. If everything fits exactly where it should, then not only is it organized but it is also often described as 'beautiful'. The term 'elegant' is also applied to code, and is that ability to make a very complex problem appear to be rather simple. Underlying that achievement is a excellent organizational scheme, not just a good one.

Organization relates back to simplification and complexity. I talked about this in The Nature of Simplification nearly ten years ago, but it was in regards to simplifying without respect to the whole context. A bad simplification like the files/folder example is also disorganization because it gradually grows to violate the similarity rule. This feeds back into complexity in that disorganization is a direct form of added 'artificial' complexity. A mess is inordinately more complex than a well-designed system, but that mess is not intrinsic to the solution. It could have been done differently.

Organization ties back to other concepts such as Encapsulation and Architecture. A fully defined place for all of the data and code that are 'interrelated' is an alternative definition of Encapsulation. Architecture is often described as the main 'lines' or structures over which everything else fits, which is in a real sense an upper level of organization of the places themselves. Given enough places, they need to be put into meta-places and enough of those need to be organized too, etc. A massive system requires many layers to prevent any potential disorganization from just being pushed elsewhere and ignored. Organization is upwardly recursive as the scale increases.

Applying this definition of organization to processes is a bit tricky, but very similar. As always, l think it's easier to work backwards for understanding process issues. Starting with the users actually accessing the necessary functionality to solve their problems, we can see that the minimal organization includes getting each dependent 'thing' into its place with the minimal amount of effort. So the process of upgrading a system, for example, is well-organized when the only steps necessary are all dissimilar from each other. If there are five config files, then there ought to be just one step that gets them all installed into the right place. More importantly, if upgrading occurs more than once, then actually automating the process is in itself a form of organizing it.

Switching out to the earlier side of the development stages, the output of analysing an underlying problem to be solved requires that everything known and discovered is appropriately stored in the correct place, in the correct form. It isn't mashed together, but rather partitioned properly so that it is most useful for the next design stage. If this principle of organization is followed, it affects everything from the product concept to the actual feedback from operations. Everything is appropriately categorized, collected and then stored in its right place. The process is organized and anyone, at any stage, can just know where to find the correct information they need or know immediately that it is missing (and is now work to be done).

This may seem like a massive effort, but really it's not. There is no point collecting and organizing data from one stage if it's not going to be used in another. Big software projects sometimes amplify make-work because of misguided perspectives on what 'could' be useful, but after decades of development it becomes rather clear as to what 'will' actually be useful and what is just wasted effort. Processes should be crafted by people with significant hands-on experience or they miss those key elements. In a disorganized process, no real distinction can be made on the value of work, since you cannot ever be sure if there will be usage someday. In a well-organized project, spotting make-work is really easy.

One can extend this definition of organization fully to processes by substituting the nouns in the above rules with the appropriate verbs for the process. Then there is a process for every action, and every action takes place in the appropriate process. As well, there is a similarity to all of the actions within the same process. Of course with computers it is entirely possible to automate significant parts of the process, such that an overwhelming number of verbs really fall back to their output; back to nouns again. Seeing the work in this way, one can structure a methodology that ensures that the right outputs are always constructed at the right time, thus ensuring efficiencies throughout the flow of work.

A well-organized process is not at all as onerous as it sounds, rather it makes concentrating on the the quality of work much easier in that there are fewer disruptions and unexpected changes. As well, there are less arguments about how to handle problems, in that the delegation of work is no longer as subjective. For instance, if at the coding stage it becomes clear that some proposed functionality just won't work as advertised, then the work reverts back to the design stage or even further to the analysis. There is little panic, and an opportunity to insure that in the future the missing effort isn't continually repeating. In a disorganized process, someone generally just whips out duct tape and any feedback is lost, the same issues repeat over and over again. In that sense, an organized process is helpful to everyone involved, at a minimal cost. If chaos and pain are frequent process problems, then disorganization and make-work are strongly present.

A development project that is extremely well-organized is one that is fairly simple to manage and will produce software at a predictable pace, allowing for estimations to properly define priorities. A smooth development project, of course, provides more options for the work to match the required solutions. Organization feeds directly into the ability of the work to fulfil its goals in a reasonable time, at a reasonable cost. The very thing that users want most from software development. Organization is initially more expensive, but in any effort that is over a few months it quickly pays for itself. In a project expected to take years, with multiple developers, it is fairly insane not to take it seriously.

Organization springs from some very simple ideas. Being somewhat subjective, it is not the sort of thing that can be appropriately defined by a standard or a committee. Really the core leaders at each and every stage of development need to setup their own form of organization and apply it consistently to everything (old or new). If that is completed, then the development work proceeds without significant friction. If it is just left to happen organically, it won't. Once disorganization takes hold, it will always get worse. It's a vicious cycle, with more and more resources sucked away. It can be painful to reverse this, but given that it is absolutely fatal to a project, ignoring it won't work. Organization is a necessity to build all but the most trivial of software.

The Programmer's Paradox

Sunday, October 25, 2015

Organization

No comments:

Post a Comment