The Programmer's Paradox: Relationships

“Everything is relative in this world, where change alone endures.”

A huge problem in software development is to create static, rigid models of a world constantly in flux. It’s easy to capture some of the relationships, but getting them all correct is an impossible task.

Often, in the rush, people hold the model constant and then overload parts of it to handle the change. Those types of hacks usually end badly. Screwed up data is computer can often be worse than no data. It can take longer to fix the problem then it would to just start over. But of course if you do that, all of the history is lost.

One way to handle the changing world is to make the meta-relationships dynamic. Binding the rules to the data gets pushed upward towards the users, they become responsible for enhancing the model. The abstractions to do this are complex, and it always takes longer to build than just belting out the static connections, but it is often worth adding this type of flexibility directly into the system. There are plenty of well-known examples such as DSLs, dynamic forms and generic databases. Technologies such as NoSQL and ORMs support this direction. Dynamic systems (not to be confused with the mathematical ‘dynamic programming’) open up the functionality to allow the users to extend it as the world turns. Scope creep ceases to be a problem for the developers, it becomes standard practice for the users.

Abstracting a model to accommodate reality without just letting all of the constraints run free is tricky. All data could be stored as unordered variable strings for instance, but the total lack of structure renders the data useless. There needs to be categorization and relationships to add value, but they need to exist at a higher level. The trick I’ve found over the years is to start very statically. For all domains there are well-known nouns and verbs that just don’t change. These form the basic pieces. Structurally as you model these pieces, the same type of meta-structures reappear often. We know for example that information can be decomposed into relational tables and linked together. We know that information can also be decomposed into data-structures (lists, trees, graphs, etc) and linked together. A model gets construction on these types of primitives, whose associations form patterns. If multiple specific models share the same structure, they can usually be combined, and with a little careful thought, named properly. Thus all of the different types of lists can just one set of lists, all of the trees can come together, etc. This lifts up the relationships by structural similarity into a considerable smaller set of common relationships. This generic set of models can then be tested against the known or expected corner-cases to see how flexible it will be. In this practice, ambiguity and scope changes just get built directly into the model. They become expected.

Often when enhancing the dynamic capabilities of a system there are critics who complain of over-engineering. Sometimes that is a valid issue, but only if the underlying model is undeniably static. There is a difference between ‘extreme’ and ‘impossible’ corner-cases, building for impossible is a waste of energy. Often times though, the general idea of abstraction and dynamic systems just scares people. They have trouble ‘seeing it’, so they assume it won’t work. From a development point of view that’s where encapsulation becomes really important. Abstractions need to be tightly wrapped in a black-box. From the outside the boxes are as static as any other piece of the system. This opens up the development to allow a wide range of people to work on the code, while still leveraging a sophisticated dynamic behavior.

I’ve often wondered about how abstract a system could go before it’s performance was completely degraded. There is a classic tradeoff involved. A generic schema in an RDBMS for example will ultimately have slower queries than a static 4th NF schema, and a slightly denormalized schema will perform even better. Still, in a big system, is losing a little bit of performance an acceptable cost for not having to wait for 4 months for a predictable code change to get done? I’ve always found it reasonable.

But it is possible to go way too far and cause massive performance problems. Generic relationships wash out the specifics and drive the code to being in NP-complete or worse. You can model any and everything with a graph, but the time to extract out the specifics is deadly and climbs at least exponentially with increases in scale. A fully generic model of everything just being a relationship between everything else is possible, but rather impractical at the moment. Somewhere down the line, some relationships have to be held static in order for the system to perform. Less is better, but some are always necessary.

Changing relationships between digital symbols mapped back to reality is the basis of all software development. These can be modeled with higher level primitives and merged together to avoid redundancies and cope with expected changes. These models drive the heart of our software systems, they are the food for the algorithmic functionality that helps users solve their problems. Cracks in these foundations propagate across the system and eventually disrupt the user’s ability to complete their tasks. From this perspective, a system is only as strong as its models of reality. It’s only as flexible as they allow. Compromise these relationships and all you get is unmanageable and unnecessary complexity that invalidates the usefulness of the system. Get them right and the rest is easy.

The Programmer's Paradox

Sunday, June 16, 2013

Relationships

No comments:

Post a Comment