Thursday, October 26, 2023

Two Generals

There are two armies, encamped on different sides of a large city.

If both armies attack the city at exactly the same time they will be victorious. If only one army attacks, the city defenders can wipe it out.

One of the generals wants to notify his peer in the other army about when to begin the attack. He sends out a message “Tomorrow at 6am”. But he receives no reply.

He has a huge problem now. Did his messenger make it to the other general? Maybe he did, but the returning messenger was captured and killed. Or maybe his messenger never made it. As he was sneaking around the outskirts of the city he met an untimely end.

So, tomorrow morning, should he attack as he said he would, assuming the messenger was successful? Or should he not attack?

Whether the message was received or not is ‘ambiguous’. The general cannot know which of the two possibilities is true; that his message didn’t make it or that the reply didn’t come back. He doesn’t have enough information to make an informed decision.

Yet the fate of the battle rests on reliable communications…

I’m sure that many readers have various suggestions for ways to remove the ambiguity. For instance, you could send other messengers, lots of them. But if they just disappear too, then the ambiguity is still there. You could try to establish waypoints, so the distance of the communication is shorter. But if there is still even a tiny corridor where the defenders reign supreme, it makes no difference.

What if instead of sending “tomorrow at 6am” you send it as a question “What about tomorrow at 6am?” Then if it is intercepted the general isn’t compelled to act. But now the other general has the exact same original problem with their reply. It swapped the problem, but it didn’t go away.

Clever people might suggest a different, alternative medium. Like flags or something, but the city is large enough that there is no guarantee about visibility, and if you used smoke or balloons, the defenders could just clear it away. The medium isn’t the problem.

The thing is, there is no perfect, always-working scenario. Where you need information but there is only an ambiguity, no matter how small you shrink it, it still remains. The ambiguity is the information.

Worse is that this is a fundamental physical constraint imposed by the universe for any sort of distributed communication. All distributed software, where separate computations need to synchronize on any sort of non-perfect medium, is bound by it. If you have a client and server, or a bunch of peers, or even two processes using a less-than-perfect medium such as files, no technology, protocol, or magic bullet will make the conversation 100% reliable, and even if it is 99.9999% reliable, there is still some sort of ambiguity in your way.

At your very best you could shrink it down to say a 1 in 100 years likelihood of failure, so you could not see it go wrong in your entire career, but it will still go wrong, someday. And that is what makes it so different from a regular computation. It is not deterministic.

Short of some unexpected external event like a flood or gamma rays or a hardware defect or something, all of the other computations will work perfectly each and every time they run on the computer. If they work for their full context, you can always assume they will always work. Oddly, we treat them as 100%, even though the physical nature of the computation itself is subject to adverse advents, the software itself, as a formal system, is not.

There is and will always be a huge gulf between 100% and 99.9...%, between deterministic computations and non-deterministic (in the distributed sense, not the language theory one) ones. Ultimately it affects the level of trust that we place in our software.

Thursday, October 19, 2023

Impermanence

Physical things can hang around for a long, long time. You can keep them in your house, for example. Short of some epic disaster like a fire, it is up to you how long you value them and when you finally get rid of them. It is in your control. For some people, it is possible to keep their stuff safe for their entire life. It becomes an heirloom.

Digital things are impermanent. There are an endless number of ways for them to get lost, forgotten, or corrupted.

The next hardware you buy is probably only good for five years, maybe a little bit longer. Rolling over to newer hardware probably won’t go well, even if you spend a ridiculous amount of time trying to figure it out.

The cloud storage you pay for may be discontinued or out of business next week. They probably scrimped on backups, so odds are that in a disaster the stuff you wanted won’t make it. They’ll apologize, of course. And they are always working harder on finding ways to increase the price than they are on finding ways to make it more reliable, engineering is not as important as exploitation. You may wake up one day to a nasty price increase. Storing stuff on the cloud is extremely hazardous.

Compounding it all, software keeps changing. Most programmers suck at achieving any real sort of backward compatibility. They’ll just force you to wipe out everything because it was easier for them to code it that way. The chances that persistent data survives for longer than a decade are slim. And even if technically it did survive, it has probably become unreachable, unusable. The software you used before to leverage it has long since been broken by someone else who didn’t know or care about what you need.

There was a big stink about forgetting things on the web, but honestly, it was a total waste of time. Not only is the web tragically forgetful, but the infrastructure on the web is getting a little more useless every day. Technically stuff could still be out there, but how would you find it now? The web is about hype, not information. Stuff tends to vanish when the limelight gets shifted.

In the grand scheme of things, the digital realm is extraordinarily flakey. Far more of its history is lost than preserved. It’s a dangerous, somewhat unpredictable place where selfishness and irrationality have more weight than quality. The odds are better that someone else whom you don’t want to see your data will see it than you getting that same data back in twenty or thirty years. We make stuff digital to be trendy, not smart.

Thursday, October 12, 2023

Curiosity

If I had to pick one quality that I think is necessary to make a big software project run smoothly, it is curiosity.


If you get 100% code that perfectly fits to the domain problems, the project will be great. If you correctly guessed how long it would take in advance, and people actually believed you and gave you that time, the politics would be negligible. 


But anyone who has been on a bunch of big projects for a long time knows that it almost never works that way.


It’s usually some sort of dumpster fire. The time isn’t enough, the fit is bad, and the technology is flaky.


So, ultimately, for the first few versions of the code, you do what you need to do in order to get it out the door. It ain’t pretty but it seems to be an inescapable reality of the job.


But after that smoke clears, and if there is still an appetite to go forward, the circumstances have changed. Hopefully, there is confidence in the work now.


Curious people will look back and what happened earlier, and what they have, and start asking the hard questions. Why did someone do that? How is that supposed to work? etc. 


If they are curious and enabled, then at least some of the huge collection of smaller problems that plague the earlier versions are now in their focus. And they will get looked at, and hopefully corrected. All of that happens outside of the main push for whatever new features other people desire. 


It is fundamentally cleanup work. There is a lot of refactoring, or replacement of weak parts. There is more investigation, and deep diving into the stranger issues. All of which is necessary in order to build more on what is there now.


Non-curious people will just claim that something is already in production, that it is locked that way, and it should not be touched, ever. They will enshrine the damage, and try to move on to other things. That is a classic mistake, building anything on a shaky foundation is a waste of time. But you have to be curious about the foundation in order to be able to assess that it is not as good as necessary.





Thursday, October 5, 2023

Feedback

I was working with a group of people a while ago who built a web app and put it on the Internet. It was a bit crude, didn’t quite follow normal user conventions, and was quite rough around the edges. When they built it, they added a button to get users' feedback.

Once they put it out there live, they got swamped by negative feedback. People took the time to complain about all sorts of things, to point out the deficiencies. It was a lot. They were overwhelmed.

So, they removed the feedback button.

As far as solving problems goes, this was about the best example of the worst possible way to do it. They had this engaged group of people who were willing to tell them what was wrong with their work, and instead of listening to that feedback and improving, they just shut down the interaction.

Not surprisingly, people avoided the app and it never really took off.

For programmers, feedback is difficult. We are already on thin ice when we build stuff, as there is more we don’t know about what we are doing than what we do know. And, it is easy to throw together something quickly, but it takes a crazy long time to make it good. This all leaves us with a perpetual feeling of uncertainty, that the things we build could alway be way better. You never really master the craft. 

Those nagging doubts tend to make most programmers highly over-sensitive to criticism. They only want positive feedback.

On top of that, user feedback is almost never literal. The users are vague and wishy-washy when they talk about what is wrong or why something bothers them.

They often know what they don’t like, but they do not know what is better or correct. Just that it is wrong. They are irrational and they usually don’t like fully explaining themselves.

In order to make sense of what they are saying, you have to learn to read between the lines. Use what they say to get an idea that something might be wrong, but then work out the actual problems on your own. Once you think you understand, you change things and test to see if that is better. It’s a soft, loose process, that usually involves endless rounds of refinements.

Things gradually get better, but it isn’t boolean. It’s not done or undone, it is a convergence. Thinking of it as one discrete task is a common mistake for popular feedback-tracking tools. They confuse the rigor of issuing instructions to a computer with the elasticity of interfaces. They try to treat it all the same when it is completely different.

If you were being pragmatic about it, you would capture all of the feedback and triage it. Positive or negative, categorized by the visible features that are involved. Then you might collect together certain negative entries and hypnosis that the underlying cause is something tangible. From there, you would schedule some work, and then schedule some form of testing. The overall category of the problem though would likely never really go away, never get resolved. It would stay there for the life of the system. Just something that you are gradually working towards improving.

The classic example is when the users say that the system or some of its features are “awkward”. That most often means that parts of the behavior do not ‘fit’ well with the users as they deal with their problem domain. It could be because the workflow is wrong, or that the interface conventions clash with the other tools they are using, or that the features should have been somewhere else, or that it is all too slow to be usable. It is hard to tell, but it is still vital feedback.

You don’t “de-awkward” a system, it is not a ‘thing’. It’s not a requirement, a ticket, a feature request, anything really. It is more about the ‘feel’ the users experience while using the features. If you want to make it less awkward, you probably have to directly interact with them while they are doing things they find awkward, then take the scratchy points you observed and guess how to minimize them. You definitely won’t be 100% correct, you might not even be 10% correct. It will take a lot to finally get your finger on the types of things that ‘you’ can do to improve the situation.

A rather huge problem in the software industry is that most people don’t want to do the above work. They only want to solve contained discrete little problems, not get lost in some undefinable, unquantifiable, swamp. We lay out methodologies, build tools, and craft processes on the assumption that all things are atomic, discrete, and tangible, and it shows in our outputs. ‘Awkward’ comments are ignored. The bad behaviors get locked in, unchangeable. People just wrap more stuff around the outside, but it too suffers from the same fate. Eventually, we just give up, start all over again from first principles, and eventually arrive at the same conclusion again. It’s an endless cycle where things get more complicated but gradually less ‘user-friendly’.