Sunday, June 16, 2013

Relationships

“Everything is relative in this world, where change alone endures.”

A huge problem in software development is to create static, rigid models of a world constantly in flux. It’s easy to capture some of the relationships, but getting them all correct is an impossible task.
Often, in the rush, people hold the model constant and then overload parts of it to handle the change. Those types of hacks usually end badly. Screwed up data is computer can often be worse than no data. It can take longer to fix the problem then it would to just start over. But of course if you do that, all of the history is lost.
One way to handle the changing world is to make the meta-relationships dynamic. Binding the rules to the data gets pushed upward towards the users, they become responsible for enhancing the model. The abstractions to do this are complex, and it always takes longer to build than just belting out the static connections, but it is often worth adding this type of flexibility directly into the system. There are plenty of well-known examples such as DSLs, dynamic forms and generic databases. Technologies such as NoSQL and ORMs support this direction. Dynamic systems (not to be confused with the mathematical ‘dynamic programming’) open up the functionality to allow the users to extend it as the world turns. Scope creep ceases to be a problem for the developers, it becomes standard practice for the users.
Abstracting a model to accommodate reality without just letting all of the constraints run free is tricky. All data could be stored as unordered variable strings for instance, but the total lack of structure renders the data useless. There needs to be categorization and relationships to add value, but they need to exist at a higher level. The trick I’ve found over the years is to start very statically. For all domains there are well-known nouns and verbs that just don’t change. These form the basic pieces. Structurally as you model these pieces, the same type of meta-structures reappear often. We know for example that information can be decomposed into relational tables and linked together. We know that information can also be decomposed into data-structures (lists, trees, graphs, etc) and linked together. A model gets construction on these types of primitives, whose associations form patterns. If multiple specific models share the same structure, they can usually be combined, and with a little careful thought, named properly. Thus all of the different types of lists can just one set of lists, all of the trees can come together, etc. This lifts up the relationships by structural similarity into a considerable smaller set of common relationships. This generic set of models can then be tested against the known or expected corner-cases to see how flexible it will be. In this practice, ambiguity and scope changes just get built directly into the model. They become expected.
Often when enhancing the dynamic capabilities of a system there are critics who complain of over-engineering. Sometimes that is a valid issue, but only if the underlying model is undeniably static. There is a difference between ‘extreme’ and ‘impossible’ corner-cases, building for impossible is a waste of energy. Often times though, the general idea of abstraction and dynamic systems just scares people. They have trouble ‘seeing it’, so they assume it won’t work. From a development point of view that’s where encapsulation becomes really important. Abstractions need to be tightly wrapped in a black-box. From the outside the boxes are as static as any other piece of the system. This opens up the development to allow a wide range of people to work on the code, while still leveraging a sophisticated dynamic behavior.
I’ve often wondered about how abstract a system could go before it’s performance was completely degraded. There is a classic tradeoff involved. A generic schema in an RDBMS for example will ultimately have slower queries than a static 4th NF schema, and a slightly denormalized schema will perform even better. Still, in a big system, is losing a little bit of performance an acceptable cost for not having to wait for 4 months for a predictable code change to get done? I’ve always found it reasonable.
But it is possible to go way too far and cause massive performance problems. Generic relationships wash out the specifics and drive the code to being in NP-complete or worse. You can model any and everything with a graph, but the time to extract out the specifics is deadly and climbs at least exponentially with increases in scale. A fully generic model of everything just being a relationship between everything else is possible, but rather impractical at the moment. Somewhere down the line, some relationships have to be held static in order for the system to perform. Less is better, but some are always necessary.
Changing relationships between digital symbols mapped back to reality is the basis of all software development. These can be modeled with higher level primitives and merged together to avoid redundancies and cope with expected changes. These models drive the heart of our software systems, they are the food for the algorithmic functionality that helps users solve their problems. Cracks in these foundations propagate across the system and eventually disrupt the user’s ability to complete their tasks. From this perspective, a system is only as strong as its models of reality. It’s only as flexible as they allow. Compromise these relationships and all you get is unmanageable and unnecessary complexity that invalidates the usefulness of the system. Get them right and the rest is easy. 

Saturday, June 1, 2013

Process

A little process goes a long way. Process is, after all, just a manifestation of organization. It lays out an approach to some accomplishment as a breakdown of its parts. For simple goals the path may be obvious, but for highly complex things the process guides people through the confusion and keeps them from missing important aspects.
Without any process there is just disorganization. Things get done, but much is ignored or forgotten. This anti-work usually causes big problems and these feed back into the mix preventing more work from getting accomplished. A cycle ensues, which among other problems generally affects morale, since many people start sensing how historic problems are continuously repeating themselves. Things either swing entirely out of control, or wise leadership steps in with some "process" to restore the balance.
Experience with the chaotic none-process can often lead people to believe that any and all processes are a good thing. But the effectiveness of process is essentially a bell curve. On the left, with no process, the resulting work accomplished is low. As more process is added, the results get better. But there is a maximal point. A point at which the process has done all that it can, after which the results start falling again. A huge over-the-top process can easily send the results right back to where they started. So too much process is a bad thing. Often a very bad thing.
Since the intent of having a process is to apply organization to an effort, a badly thought out process defeats this goal. At its extreme, a random process for example, it is just formalized disorganization. Most bad processes are not truly random but they can be overlapping, contradictory or even have huge gaps in what they cover. These problems all help reduce the effectiveness. Enough of them can drive the results closer to being random.
Since a process is keyed to a particular set of activities or inquires, it needs to take the underlying reality into account. To do this it should be drafted from a 'bottom-up' perspective. Top-down process rules are highly unlikely to be effective primarily because they are drafted from an over-simplification of the details. This causes a mismatch between the rules and the work, enhancing the disorganization rather than fixing it.
Often bad process survives, even thrives, because its originators incorrectly claim success. A defective software development process, for instance, may appear to be reducing the overall number of bugs reaching the users, but the driving cause of the decreases might just be the throttling of the development effort. Less work gets done, thus there are less bugs created, but there is also a greater chance for upper management to claim a false victory.
It's very easy to add complexity to an existing process. It can be impossible to remove it later. As such, an overly complex process is unlikely to improve. It just gets stuck into place becoming an incentive for any good employees to leave, and then continues to stagnate over time. This can go on for decades. Thus arguing for the suitability of a process based on the fact that its been around for a long time is invalid. All it shows is that it is somewhat better than random, not that is is good or particularly useful in any way.
Bad process leaves around a lot of evidence lying around that it is bad. Often the amount of work getting accomplished is pitifully low, while the amount of useless make-work is huge. Sometimes the people stuck in the process are forced to bend the truth just to get anything done. They get caught between getting fired for getting nothing done or lying to get beyond the artificial obstacles. The division between the real work and its phantom variant required by the process manifest into a negative conflict-based culture.
For software, picking a good process is crucial. Unfortunately the currently available choices out there in the industry are all seriously lacking in their design. From experience the great processes have all been carefully homegrown and driven directly by the people most affected by them. The key has been promoting a good engineering culture that has essentially self-organized. This type of evolution has been orders of magnitude more successful than going out and hiring a bunch of management consults who slap on a pre-canned methodology and start tweaking it.
That being said, there have also been some horrific homegrown processes constructed that revel in stupid make-work and creatively kill off the ability to get anything done. Pretty much any process created by someone unqualified to do so is going to work badly. It takes a massive amount of direct experience with doing something over and over again before one can correctly take a step back and abstract out the qualities that make it successful. And abstraction itself is a difficult and rare skill, so just putting in the 10,000+ hours doesn't mean someone is qualified to organize the effort.
Picking a bad process and sticking to is nearly the same as having no process. They converge on the same level of ineffectiveness.