Thursday, July 25, 2024

Updating Software

It would be nice if all of the programmers in this world always produced high-quality bug-free software. But they don’t.

There are some isolated centers of excellence that get pretty close.

But the industry's desire to produce high quality has been waning in the face of impatience and exponentially exploding complexity. Most programmers have barely enough time to get their stuff sort of working.

Thus bugs are common, and the number in the wild is increasing at an increasing rate.

A long time ago, updates were always scheduled carefully.

Updating dependencies was done independently of any code or configuration changes made on top. We’d kept the different types of changes separated to make it easier to diagnose problems. The last thing you want is to deploy on top of chaos.

Sometimes we would package upgrades into a big release, but we did this with caution. Before the release we would first clean up the code, then update the dependencies, then go off and do some light regression testing. If all that looks good we’d start adding the big changes on top. We would do full testing before the release, both old and new features. It was a good order, but a bit expensive.

For common infrastructure, there would be scheduled updates, where all other changes were frozen ahead and behind it. So, if the update triggered bugs, we’d know exactly what caused them. If enough teams honored the freeze, upgraded dependency bugs would be picked up quickly and rolled back to reduce the damage.

A lot of that changed with the rise of computer crime. Some older code is buggy so exploits are happening all of the time. An endless stream of updates. Lots of security issues convinced people to take the easy route and turn on auto-updates. It keeps up with the patches, but the foundations have now become unpredictable. If you deploy something on top and you get a new bug, your code might be wrong or there could have been a change to a dependency below. You can’t easily tell.

The rise of apps — targeted platform programs — also pushed auto updates. In some cases just to get new features out quickly, the initial apps were kinda crude, but also for major security flaws.

Auto updates are a pretty bad idea. There are the occasional emergency security patches that need immediate updating, but pretty much everything else does not. You don’t want surprise changes, it breeds confusion.

Part of operations should be having a huge inventory list of critical dependencies and scanning for emergency patches. The scanning can be automated. If one shows up and it is serious, it should be updated immediately. But it always requires human intervention. Someone should assess the impact of making that change. The usage of the dependency may not warrant the risk of an immediate upgrade.

Zero-day exploits on internal-only software, for example, are not emergencies. There are no public access points for them to be leveraged. They need to get patched, but the schedule should be reasonable and not disruptive.

If we go back to that, then we should return to most updates being scheduled. It was a much better way of dealing with changes and bugs.

Software is an endless stream of changes, so to quote Ghostbusters: “Never cross the streams”. Keep your changes, dependency changes, and infrastructure changes separate. The most important part of running software is to be able to diagnose the inevitable bugs quickly. Never compromise that.

It’s worth noting that I always turn off auto updates on all my machines. I check frequently to see if there are any updates, but I do not ever trust auto updates and I do not immediately perform feature updates unless they are critical.

I have always done this because more than once in my life I have seen a big software release blow up because of auto updates. Variances in development machines cause works-on-my-machine bugs or outages at very vulnerable moments.

Teams always need to synchronize the updates for all of their tools and environments, it is bad to not do that. And you never want to update anything in the last few days or weeks before the release, it is just asking for trouble.

Some people will complain that turning off auto updates will take us back to the old days when it was not uncommon to find very stale software running out there. If it's been years of ignoring updates, the risks in applying all of them at once are huge, so each delay makes it harder to move forward. That is an operational deficiency. If you run software, you have a responsibility to update it on a reasonable basis. It is all about operations. Cheating the game with auto updates just moves the risks elsewhere, it does not eliminate them.

Some people don’t want to think about updates. Auto updates make that effort go away. But really you just traded it for an even worse problem: instability. A stale but reliable system is far better than the latest and greatest unstable mess. The most important part of any software is that it is trustworthy and reliable. Break those two properties and there is barely any value left.

I get that just turning on auto-updates makes life simpler today. But we don’t build software systems just for today. We build them to run for years, decades, and soon possibly centuries. A tradeoff does not eliminate the problem, it just moves it. Sometimes that is okay, but sometimes that is just making everything worse.

No comments:

Post a Comment

Thanks for the Feedback!