Thursday, January 5, 2023

Dependencies

One of the trickiest issues in software is figuring out the exact dependencies between any two complex things. Sometimes you kind of know it’s dependent, but it is vague. To discuss it, we first have to go really abstract, then we can return down to the underlying issues.

In the craziest, broadest sense, everything is basically dependent on everything else.

We live in one universe, so everything that we are aware of is currently in here with us. It all shares the same global ‘context’.

That’s a kind of fun way of looking at it. It implies that no two things are ever ‘entirely’ independent in that broadest context. Thus ‘independence’ is another human abstraction like ‘perfection’ and ‘infinity’. It’s a term we use loosely, but it's probably only a byproduct of our creative imagination, not the world around us.

We can talk about dependence relative to smaller, tighter contexts. And we can keep shrinking the context until any two things do become independent. The context is null if they are the exact same thing. But even if they are absolutely perfect clones of each other, they still occupy different spatial coordinates, thus the context.

This gives us a rather grand gradient. Any two things are dependent until the context is tight enough that they are independent.

As odd as that sounds, we can use it and its converse for some interesting stuff.

Given two things we think are independent, we can expand the context to where they are dependent. That can actually define the ‘dependencies’. If the smallest context where they are dependent is finite then it is composed of a normalized set of variables. If we tightened the context enough, then it is exactly that set of variables for the dependence between the two things.

And the opposite is true as well. Given any two dependent things, we can shrink the context until they are independent, then use that to figure out what is variable; what binds them.

That makes it interesting in that for that context, described that way, it is pretty much always a finite set of independent variables itself, and we can expand, contract, or shift it around as needed to get to a different context.

From this, it seems like there is only one ‘normalized’ context that sits on the border between independence and dependence, although it may be true that there are an infinite number of different basis variables (equivalent but alternative variable sets).

While this is a very abstract discussion so far, it does actually apply to software. It applies to both data and code, so we’ll start with the data first.

Two pieces of data are dependent on each other in some context. But that context may be looser than the given system parameters, so, with respect to the system, they are independent. We can fiddle with both, at the exact same time, without having to worry about interference. Thus we can parallelize any operations on the two independent pieces of data.

If the context of the dependence however is within the context of the system, then the two pieces of data are dependent on each other, and we cannot safely fiddle with both at the same time. We need to coordinate any actions, which is either some form of locking or utilizing atomic operations (which is just embedding an implicit lock into the operation itself).

If two pieces of data are dependent, and the system treats them as independent, that is effectively a race condition, and it will go wrong at some frequency of occurrence. That frequency might be once every 100 years, so the people that build the system may be unaware that it has a problem, but it still is a problem.

Data issues are effectively ‘runtime’ issues. They happen when the system is being used. Code issues are more often ‘compile time’ issues, as we are building stuff. Although not being threadsafe, for example, is a code dependency runtime issue, so, they do exist too but will skip them for now.

With code, as you are generally putting together millions of instructions for the computer to follow, it would be best if you didn’t have redundant code everywhere. Why? It is a behavior dependency that degrades with time. That is, there is an implicit dependency on 2 pieces of code behaving the same way to make things work, but someone later edits one of the pieces of code which changes it, and triggers an unwanted side effect.

In that sense, given an overlap or commonality between any two sequences of instructions, if that dependency is relative to the context of the system, then keeping the code redundant is fragile and may be the root cause of triggering a bug in the not-too-distant future.

For some people, they believe that all that matters is the next release, but really a system is strong because it was built with anticipation of its full life span. That is, ignoring code dependencies is ignoring a fundamental requirement of the system itself, which is that it should continue to run correctly until it is finally retired.

That’s it at a high level, but we can even get a little lower, yet more general when discussing dependencies. One great issue is what we often call the ‘merge problem’.

In general, given two different sets of data that are related, a computer can’t automatically merge them. If there is a single variable with two values, you can set a precedence, and using that automatically merge. But if the data is composite, and you have two different versions of it, you can’t automatically merge. Why? Because some of the fields (attributes) may be dependent on other fields, and the changes themselves may involve context outside of the system. For example, if you have 5 fields, A .. E, and one user changes A and B, while another changes A and E, you can’t just put them together. Only the users would know whether B or E is more appropriate with the given A. They are a part of the context.

Now, we do seem to get around that with code repositories, like git, but there is a trick. Their merges are done line-by-line, so lifting it up that way, any merge conflicts are reduced. But they can still occur and often do, which is why we have so many tools and techniques for manually merging code. It kinda works but it also goes wrong fairly often. Enough that we can’t just automate it and expect it to be reliable.

So, it’s pretty fair to say that you can’t merge data unless you can tightly constrain it with a very tight context. So, in general, you can’t do it, but some days you might get lucky.

Dependency plays out in other ways as well. You build a system with some libraries or framework dependencies. But it is recursive and some of those dependencies clash. That would be okay if every dependency was effectively ‘runtime’ safe and you could load different versions into memory at the same time, but it is more often the case that there are dependent locations in memory that will get shared, which will cause unexpected behaviors. The authors might have left a static or global lying around, but are playing with it in different ways in the different versions.

We see all sorts of other dependencies playing out in software projects. The development work itself is often dependent on other work. Domain logic is usually dependent on its industry. Synchronizing data is dependent on it not changing.

Complexity comes from the intricacies of the thing itself, plus all of its dependencies, within a given context. It is why tunnel vision is often a problem. You shrink the context until the dependencies become manageable, but if some of the discarded dependencies were significant, when it goes back into the larger context it goes very wrong. Dependencies drive complexity growth.

No comments:

Post a Comment

Thanks for the Feedback!