“Everything is relative in this world, where change alone endures.”
A
huge problem in software development is to create static, rigid models
of a world constantly in flux. It’s easy to capture some of the
relationships, but getting them all correct is an impossible task.
Often,
in the rush, people hold the model constant and then overload parts of
it to handle the change. Those types of hacks usually end badly. Screwed
up data is computer can often be worse than no data. It can take longer
to fix the problem then it would to just start over. But of course if
you do that, all of the history is lost.
One
way to handle the changing world is to make the meta-relationships
dynamic. Binding the rules to the data gets pushed upward towards the
users, they become responsible for enhancing the model. The abstractions
to do this are complex, and it always takes longer to build than just
belting out the static connections, but it is often worth adding this
type of flexibility directly into the system. There are plenty of
well-known examples such as DSLs, dynamic forms and generic databases.
Technologies such as NoSQL and ORMs support this direction. Dynamic
systems (not to be confused with the mathematical ‘dynamic programming’)
open up the functionality to allow the users to extend it as the world
turns. Scope creep ceases to be a problem for the developers, it becomes
standard practice for the users.
Abstracting
a model to accommodate reality without just letting all of the
constraints run free is tricky. All data could be stored as unordered
variable strings for instance, but the total lack of structure renders
the data useless. There needs to be categorization and relationships to
add value, but they need to exist at a higher level. The trick I’ve
found over the years is to start very statically. For all domains there
are well-known nouns and verbs that just don’t change. These form the
basic pieces. Structurally as you model these pieces, the same type of
meta-structures reappear often. We know for example that information can
be decomposed into relational tables and linked together. We know that
information can also be decomposed into data-structures (lists, trees,
graphs, etc) and linked together. A model gets construction on these
types of primitives, whose associations form patterns. If multiple
specific models share the same structure, they can usually be combined,
and with a little careful thought, named properly. Thus all of the
different types of lists can just one set of lists, all of the trees can
come together, etc. This lifts up the relationships by structural
similarity into a considerable smaller set of common relationships. This
generic set of models can then be tested against the known or expected
corner-cases to see how flexible it will be. In this practice, ambiguity
and scope changes just get built directly into the model. They become
expected.
Often
when enhancing the dynamic capabilities of a system there are critics
who complain of over-engineering. Sometimes that is a valid issue, but
only if the underlying model is undeniably static. There is a difference
between ‘extreme’ and ‘impossible’ corner-cases, building for
impossible is a waste of energy. Often times though, the general idea of
abstraction and dynamic systems just scares people. They have trouble
‘seeing it’, so they assume it won’t work. From a development point of
view that’s where encapsulation becomes really important. Abstractions
need to be tightly wrapped in a black-box. From the outside the boxes
are as static as any other piece of the system. This opens up the
development to allow a wide range of people to work on the code, while
still leveraging a sophisticated dynamic behavior.
I’ve
often wondered about how abstract a system could go before it’s
performance was completely degraded. There is a classic tradeoff
involved. A generic schema in an RDBMS for example will ultimately have
slower queries than a static 4th NF schema, and a slightly denormalized
schema will perform even better. Still, in a big system, is losing a
little bit of performance an acceptable cost for not having to wait for 4
months for a predictable code change to get done? I’ve always found it
reasonable.
But
it is possible to go way too far and cause massive performance
problems. Generic relationships wash out the specifics and drive the
code to being in NP-complete or worse. You can model any and everything
with a graph, but the time to extract out the specifics is deadly and
climbs at least exponentially with increases in scale. A fully generic
model of everything just being a relationship between everything else is
possible, but rather impractical at the moment. Somewhere down the
line, some relationships have to be held static in order for the system
to perform. Less is better, but some are always necessary.
Changing
relationships between digital symbols mapped back to reality is the
basis of all software development. These can be modeled with higher
level primitives and merged together to avoid redundancies and cope with
expected changes. These models drive the heart of our software systems,
they are the food for the algorithmic functionality that helps users
solve their problems. Cracks in these foundations propagate across the
system and eventually disrupt the user’s ability to complete their
tasks. From this perspective, a system is only as strong as its models
of reality. It’s only as flexible as they allow. Compromise these
relationships and all you get is unmanageable and unnecessary complexity
that invalidates the usefulness of the system. Get them right and the
rest is easy.