Tuesday, March 24, 2020

Universal Truth

“History is written by the victors.” -- Winston Churchill

At the moment, our species seems to mostly believe that ‘truth’ is a relative concept. Something is true if enough people believe it. It is false otherwise. This leads us to confusion, conflict and more than a few disasters.

The truthfulness of any given statement is in itself a very interesting discussion, but it's predicated on the underlying definitions of several different words. If we want to get to the ‘truth’, the first thing to do is carefully define the destination.

We’ll start with a thought experiment.

Let’s say that there is an ‘alien’ sitting on the moon right now, watching us. This alien is super-intelligent, and isn’t affected by our normal human biases. It’s asextual so there isn’t a male/female bias, it’s a different species that is extremely logical, but not emotional. It has nifty equipment that allows it to see through walls, hear everything, observe smells, and can focus on whatever interests it at any time, even concurrently. In short, it is the perfect observer.

With all of these capabilities, the alien is able to assert that something did or did not happen at a particular time and location. In that role, it acts as an ‘oracle’ and can be precise.

Was a car packed outside of a specific house at 2pm? The alien knows the answer to this. Who drove that car there? That is also answerable. In particular, the alien can precisely state what occured at a specific instance of time for a bunch of related particles. If it happened, if it didn’t happen, if it happened at a different time and place.

At some higher level abstract sense, it is impossible to say that there is a ‘universal truth’. If something occurred way off deep in the darkest corner of the universe, where no one could have seen it, or confirmed it, then only the universe itself is a witness to it. But the universe isn’t, we think, sentient enough to be able to pass on that knowledge. Our alien, however, while not being able to give us universal truth, would be able to give us a very good, relative to our planet, approximation to universal truth, that we have set up to be nearly indistinguishable from the universal variety.

Getting back to definitions, what we would like to do is that ‘define’ all of that information that is available from the alien as something concrete. So, let's call it ‘facts’. We ask questions, and the alien can give us a set of related facts and we know these are deeply trustworthy. If the alien says a car was driven by a particular individual then that is what happened. And with that frame of reference it is fundamentally 99.9999...% universal truth. It is not 100%, since we got the information from the alien, not from the universe itself, but we are so close to 100% that the infinitesimal difference shouldn’t make any impact in our lives.

If we take that as a definition of ‘facts’ and use it to discount anything that the alien couldn’t answer, we get into a precisely defined set of information that has useful properties. The alien, for example, may be able to say who drove the car to the house, but it cannot explain ‘why’ they did it. The motivations of the driver and their agenda aren’t observable by the alien, just their actions. In that defined sense then, the location of the car is entirely factual, and there is a universal truth associated with it. We may or may not be able to get that confirmation, since the alien is hypothetical, but if enough people here on earth were also witnesses and they were trustworthy enough, then that could be an approximation to the alien’s approximation to the universal truth. So, we could get greater certainty that the car was in that location at that time, and that it was driven by that person.

So, if you’re not lost by the pedantic nature of this discussion, then it says that although we don’t have an alien to explicitly tell us facts, we can arrive at successively better approximations of these facts simply by expanding our set of trustworthy witnesses and evidence, and if we do that in a way that one bad fact can’t corrupt the others, then it seems likely that we can believe that these facts are as close to universal as we’ll even be able to get in our lifetimes.

This, of course, is the bias for scientific discovery and most legal systems, but it can extend beyond those disciplines into all other aspects of our societies. We can theorize about what the alien can answer, and then we can collect together enough relative truths to approximate that.

“Ok, but how is this useful?”

It has often been the case in many of our modern world that various parties have been informing us of what they are trying to convince us is the truth based on their “facts”. But it is frequently the case that for the “facts” discussed, the underlying definition of what a fact is, is so wide and vague that it would be impossible for an alien sitting on the moon to assert what they are saying.

These “incorrect facts” then are more often attempts to kick up the dust, allowing people to propagandize some other agenda. This is unstoppable if truth is really relative, applies to anything, and the only truth that matters is the one that is repeated the most often. And we see that often now. It’s not actual facts in discussion but liternal ‘nonsense’, that is shaped to be easily consumable. The winners are the ones that get their morsels out there the most often.

But as I have shown above, calling this stuff “facts” is incorrect. If someone asserts that it is fact that the driver’s motive was to cause trouble, so they drove their car to the house, we can and should be able to shut down the conversation by pointing out that that is not a “fact” by any reasonable definition. They are not being factually correct, thus any further assertions or theories based on top of this are only personal opinions. We cannot know the motives of the driver, the alien cannot know this.

We can’t stop people from giving us their views, but we can stop them from asserting that their opinions are factual and thus are the truth. They are not, they are just opinions. We do have various rules, laws and regulations in most societies that restrict people or companies from blatantly lying, but they are blunted by poor definitions. If we fixed that, then at bare minimum it would be a means to interfere with any type of propaganda campaign, using some already accepted means in society of proper redress. We’ve already agreed on how to deal with lies, now we just need to move that forward to include these highly questionable “facts”.

If we continue to allow ourselves to base our actions, plans, and institutions on continuously shifting relative truths, then this instability will permeate the fabric and structure of our actions. It prevents us from achieving any of our goals. Long ago, we gained the ability to build up reasonable scientific knowledge, which given our current technological sophistication has made massive improvements to how we live and interact with each other. But we stopped halfway and left the door open to irrational communication. It’s time we revisit that aspect of our societies, and it is time that we fix it in order to further improve our lives. We’ve grown up a lot as a species over the last few hundred years, but there is still a great distance left to travel.

Monday, March 23, 2020

Organizing Complexity

Let's say we want to do something massively complex. By ‘massive’ we mean that there is no single human on the planet that could fully understand the whole thing, at any one time.

If they had to work on a ‘small part’ it would be fine, but if they need to do stuff that spans a bunch of parts, then they have to iterate through each one, one at a time. In doing that, they incur a significant risk that one or more of the sub-parts is incorrect. That somewhere along the way, they will derail.

What we would like to do is to give them a means of being certain that all of their work fits correctly into the big picture. The most obvious idea is to have a means of planning the work, before doing it, such that any problems or issues with the plan are transparent enough and can be corrected easily before the work is started.

The quality of a plan is dependent on its foundations. If the foundations are constantly shifting, the plan itself would need constant modification to keep up.

So an essential part of planning is to lock down every major thing that can vary.

In this case, it would be locked down for the length of the work, but afterward, it could be revisited and changed. That length of time would be some guesstimate about the sum of the expected work for the parts.

If the guesstimate were too short, and the changes couldn’t be delayed, then they might derail the plan by throwing off the continuity of the work, which would increase its likelihood to be wrong. So, obviously locking for too long is considerably less risky.

The plan itself is subject to complexity.

In that, planning for a chaotic or disorganized environment is at least orders of magnitude more complex than if everything is neat and tidy.

In that sense, the plan itself becomes one of the parts of the project and if it’s a large plan, then it falls right back to the same issues above. One would need to work through the parts of the plan to ensure that they maintain consistency and in order to do that correctly, one would need a meta-plan. This, of course, is upwardly recursive.

If we wanted to ensure that things would happen correctly, we would need smaller and smaller meta-plans, until we arrive at one that is small enough that a single person could create and check it properly. Then, that would expand to the next level, and as we walked through those parts, it would expand to the level below, etc.

This shows that the costs of disorganization are at least multiplicative. They make the actual work harder, but they also make the planning harder too, and everything on top as well, plus they make the overall hierarchy larger too.

So, the first big thing that would likely increase the likelihood of success would be to carefully organize as much as possible, with well-defined rules, so that any following planning or work went smoothly.

The converse is also true as well, in that if a big project fell apart, the most likely reasons that this happened were changes to the foundations and disorganization. Changes themselves are fairly easy to track, and if they haven’t been tracked, then that chaos is a form of disorganization, so one would separate out the different weights of the causes of the problem as caused by not locking down enough of the work, and just being disorganized (in many, many different ways). Usually, the latter is way more significant.

In its simplest form, organization is a place for everything and everything in its place. On that top this we also need to control the different similar (but not exactly the same) things being in the same place, in that if there are only a few similar things, they can be grouped together, but once there are more than a few of them, they need their own distinct ‘places’ to break up the collection. That is a scaling property, which is often seen in that if things are really small, everything is similar, but as it starts to grow, the similarities start to need to be differentiated. There is a recursive growth problem here too in that the places themselves need to be organized, needing meta-places, etc. So, it’s never-ending, but it does slow down as things become, large, massive, etc. Places need to be split, then when there are enough of them, organized too.

If you know that a project will say go from small, through medium and then to large, it would be far more effective to settle on an organization scheme to support the large size, then having to reorganize each time everything grows. Basically, if there were some ‘growth plan’ and its elements could be locked enough, you could skip the less organized variations and go straight to the heavier version, taking more time initially, but saving a huge amount overall.

So, the maximum locking time is an intrinsic quality of the environment.

The longer it is, the more you can leverage it to be more efficient. If it is short, perhaps because there are issues like market fit involved, such that the initial stages need viability testing first, then that sets the planning length, and indirectly the type of organization that can be applied.

But, almost by definition, that first stage can’t be massive, so for volatile environments, it is far better to just find the shortest path forward. The only caveat there is that that method of working itself needs to be changed if the first stage turns out to be viable and the product needs to grow massive. Basically, absolutely every assumption needs to be reset and redone from scratch.

That indirectly says a lot about why trying to treat everything from a reactive short term perspective fails so spectacularly when applied to big projects. If there is a strong likelihood of any long-term, then not utilizing that, and basically trying to just treat all of the sub-parts as if they are independent is going to derail frequently, and eventually prevent the rest of the work from getting completed.

If one avoids explicitly organizing things, then since it doesn’t happen by accident, growth will be chaotic. It’s in and around this trade-off that we realize that there can be no ‘one-size-fits-all’ approach for all sizes and time frames.

With all of that in mind, if we want to do something massively complex, then organization and planning are essential to ensuring that the complexity won’t overwhelm it. If we want to do something we think is trivial, but it turns out that it isn’t, we pretty much need to return to the same starting point, and do it all over again. If we don’t, the accumulated disorganization will shut it all down prematurely. That is hardly surprising, in that watching master craftsmen in various domains work, they often keep their workspaces tidy and clean up as they go. It’s one of the good habits that helped them master their craft.

Friday, March 13, 2020

Testing for Oversimplification

For any ‘lump’ of complexity there is an ‘inside’ and an ‘outside’.

From the outside, things always look a lot simpler. The devil is in the details, and the details are always on the inside. If you really want to know something, you have to get depth, which is basically digging around on the inside trying to, at very least, inventory most of the details.

So, now with that as a foundation, if someone presents a set of primitives for which they claim solves some aspect of the complexity, it would be really useful to have some test that can be applied to see if their work is correct, an oversimplification or even possibly over-the-top.

The means to do that would be to get some inside knowledge. That would give us a sense of the frequency of things that occur. With that frequency, we could take some of the more frequent ‘corner-cases’ and see if they fit into the primitives.

If we find that some fairly frequent thing cannot be represented within the primitives, but many things can, then the most likely explanation is that the solution is oversimplified.

That’s a pretty abstract way of looking at it. Let's try an example.

In most companies, there is some reporting hierarchy. There are employees that are direct reports for a boss. From the outside it might seem like an N-ary tree is a great way to represent this as a data-structure. Then we might go forth and start implementing any collection for this type of data as a tree.

However, it is not infrequent that some employees, for some period of time, report to multiple people. This happens for a lot of reasons, and it is more likely where the employee roles are less defined. How does our tree idea work with this?

A single linked tree where the parent points at a set of children obviously won’t work. A doubly linked one where the children point back up to the parent as well, doesn’t help. If a child node points to multiple parent nodes, that correctly holds this relationship, but technically the data structure isn’t a tree anymore, it is now either a directed acyclic graph (DAG) or a full graph. We'd rather have the former, in that the latter opens up cycles which can be difficult to deal with properly in the system.

So, using a tree (of any type) is an oversimplification, a DAG seems pretty good and a full graph is likely over the top.

Or is it? What if we just changed the definition of ‘reporting’ to be ‘direct reporting’ to one and only one person. Then the employee might have zero or more indirect bosses, but always just one that captures the HR relationship.

So, we think we’ve fixed the problem by taking the concept of ‘reporting’ and categorizing it into 2 distinct groups, the first of which maintains the tree relationship. But what about the second? We could have some ‘dotted line’ internal structure working through our tree, but since the indirect reports can be 0, or 1, or 2, or N we suddenly fall back to having the dotted line relationships to being a DAG again. If a tree is overlaid on a DAG even if there are two different types of structural referencing, stepping back a bit says that the bounding structure is actually still a DAG.

More to the point, if the system needs to capture the way people report to each other, then just throwing away half of that so that we can jam the whole thing into a tree is not satisfying the initial requirements. It will just cause other problems.

As people often do, the next obvious approach is to just disconnect the whole thing and not think of it as a data structure.

Any employee can point to a list of bosses and any bosses can point to a list of employees. Super flexible, right? Except that without restricting it, in any way, a boss can point to an employee who can then point back to being their boss. We can create a cycle, really easily.

If we check for that special case, is that better? Not really, in that cycles of length 2 could be caught and stopped, but then we have to do the same for cycles of length 3, then 4, then ... So now, we have some huge cycle checker in the code to avoid crashes or endless loops, but if they do find something wrong we may need to get a human to interfere and arbitrate in some way. Either the new link causes the cycle, or one of the old links is way wrong. The code can’t tell accurately. So, we tumble farther down the rabbit hole, while trying to avoid what is basically the intrinsic nature of the original data.

The big trick here is to start with needing to capture the working relationships as a requirement for a system. There may be lots of proposals for how to handle this, but if we know enough about the underlying data, we can quickly use that to discount any proposals really rapidly. If we leap on the first, plausible sounding idea, implement it and then try to correct it from there, it will go badly.

Oversimplification is really, really common. For most of the stuff we encounter, we are outsiders. It takes a bit of work to prevent it, but it is the type of work that ends up saving a lot of wasted work, later.

Monday, March 9, 2020

Globals

Sometimes we encounter a bug while running software in ‘production’.

This is unavoidable in that all software is a combination of technical and domain issues; the latter means that the foundations are informal. An aspect of informality is that there are valid things that can occur that are currently unknowable, so it will always be the case that the logic of the existing code doesn’t match reality, and that that drift will ‘bug’ people. All programs will always have bugs, the best ones will have them less often.

When we’ve hit a bug, we replicate it, make code changes and then test them. However, doing a 100% full regression test for all known permutations of any underlying data is extraordinarily time intensive. Without such a test, there is a risk that the current change will break some other part of the program as a side-effect. If the code is tightly scoped, we can manually assert that all of its effects will stay within the scope, and we can manually check the entire scope to ensure that those effects are acceptable.

A trivial example is that there is a typo in a message string to the user. If that string is only used in one place in the code, then any changes to it will only affect that one usage. It’s a pretty safe change.

A non-trivial example is that there is a change to the structure of the persisted data, perhaps combining 2 previously separated fields into one. That change includes some number of deletes and modifications. There are some other lines of code that are explicitly dependent on those changes. They need to be modified to deal with the new schema. But there is also another set of lines of code that is dependent on those lines, and another set dependent on those, etc. going upwards in an inverted tree. It is possible to diligently search through and find all of the effect lines of code, and then assert that there are no unintended side-effects, but for a medium or large program this too is very time intensive. It also requires significant knowledge about how the underlying language is designed, so one can search through all of the likely but not obvious corner-cases. It is far more common that people just take wild guesses and hope that they are correct.

So, if that data, or any data derived from that data can be accessed anywhere in the program, then the full scope of any possible impacts is global. The same is true for any function or method calls. If there is nothing limiting scope, then they are global.

The benefits of making things global is that you can use them anywhere. The downside to doing it is that that convenience makes it nearly impossible to properly know the impact any dependent changes. So, basically it’s a classic trade-off between coding faster now, or being way more confident in changing stuff later.

One way around this is to be very tight on scope for most of the data and code, say 95%. So, if a bug lands on that data or code, figuring out the impact of any changes is contained. For the other code, where it is absolutely necessary to share data and code across the entire system, the syntax is explicit, the structural level is flat.

For example, if you must have a global parameter that is used everywhere, then accessing it is via just one function call. That makes it easy to search. It’s value is the most decomposed primitives, so there should be no other decomposition operations on it. As well, all accesses to the data are read-only, only one encapsulated section of code is allowed to ever modify it. With those sort of higher level ‘properties’, any bug involving this data still needs to be checked across the whole code base, but the checks themselves are constrained to make them faster. You might be able to assert in minutes that changing the value will not cascade into other bugs.

One of the key differences that many people miss with growing code bases, is that the growth of work to assert that a change to globals wouldn’t cause side-effects is at least exponential. That happens because of the inverted tree, but it can also happens because of the language corner-cases. In a small program, you still might have to check a few places to see if there is an impact, but the program is small, so it's often just faster to scan all of the code. When the program gets to medium size, scanning the code is still barely possible, but now it might take days instead of minutes or hours. If a program is large, and has lots of authors, you can never really be sure about what language features they took advantage of, or how they obfuscated some indirect reference to the data, so the exponential increase in basic searching suddenly becomes quite obvious, and scanning all of the code impossible.

Globals are necessary and useful, but they come with a huge cost, so it's always worth spending a lot of time during development to make sure that they are not heavily impacting testing, bug fixing, extensibility, etc. It is one of those ‘a stitch in time, save nine’ issues where the payoffs might not be immediately obvious, but its contribution to failure sure is.