Thursday, July 25, 2024

Updating Software

It would be nice if all of the programmers in this world always produced high-quality bug-free software. But they don’t.

There are some isolated centers of excellence that get pretty close.

But the industry's desire to produce high quality has been waning in the face of impatience and exponentially exploding complexity. Most programmers have barely enough time to get their stuff sort of working.

Thus bugs are common, and the number in the wild is increasing at an increasing rate.

A long time ago, updates were always scheduled carefully.

Updating dependencies was done independently of any code or configuration changes made on top. We’d kept the different types of changes separated to make it easier to diagnose problems. The last thing you want is to deploy on top of chaos.

Sometimes we would package upgrades into a big release, but we did this with caution. Before the release we would first clean up the code, then update the dependencies, then go off and do some light regression testing. If all that looks good we’d start adding the big changes on top. We would do full testing before the release, both old and new features. It was a good order, but a bit expensive.

For common infrastructure, there would be scheduled updates, where all other changes were frozen ahead and behind it. So, if the update triggered bugs, we’d know exactly what caused them. If enough teams honored the freeze, upgraded dependency bugs would be picked up quickly and rolled back to reduce the damage.

A lot of that changed with the rise of computer crime. Some older code is buggy so exploits are happening all of the time. An endless stream of updates. Lots of security issues convinced people to take the easy route and turn on auto-updates. It keeps up with the patches, but the foundations have now become unpredictable. If you deploy something on top and you get a new bug, your code might be wrong or there could have been a change to a dependency below. You can’t easily tell.

The rise of apps — targeted platform programs — also pushed auto updates. In some cases just to get new features out quickly, the initial apps were kinda crude, but also for major security flaws.

Auto updates are a pretty bad idea. There are the occasional emergency security patches that need immediate updating, but pretty much everything else does not. You don’t want surprise changes, it breeds confusion.

Part of operations should be having a huge inventory list of critical dependencies and scanning for emergency patches. The scanning can be automated. If one shows up and it is serious, it should be updated immediately. But it always requires human intervention. Someone should assess the impact of making that change. The usage of the dependency may not warrant the risk of an immediate upgrade.

Zero-day exploits on internal-only software, for example, are not emergencies. There are no public access points for them to be leveraged. They need to get patched, but the schedule should be reasonable and not disruptive.

If we go back to that, then we should return to most updates being scheduled. It was a much better way of dealing with changes and bugs.

Software is an endless stream of changes, so to quote Ghostbusters: “Never cross the streams”. Keep your changes, dependency changes, and infrastructure changes separate. The most important part of running software is to be able to diagnose the inevitable bugs quickly. Never compromise that.

It’s worth noting that I always turn off auto updates on all my machines. I check frequently to see if there are any updates, but I do not ever trust auto updates and I do not immediately perform feature updates unless they are critical.

I have always done this because more than once in my life I have seen a big software release blow up because of auto updates. Variances in development machines cause works-on-my-machine bugs or outages at very vulnerable moments.

Teams always need to synchronize the updates for all of their tools and environments, it is bad to not do that. And you never want to update anything in the last few days or weeks before the release, it is just asking for trouble.

Some people will complain that turning off auto updates will take us back to the old days when it was not uncommon to find very stale software running out there. If it's been years of ignoring updates, the risks in applying all of them at once are huge, so each delay makes it harder to move forward. That is an operational deficiency. If you run software, you have a responsibility to update it on a reasonable basis. It is all about operations. Cheating the game with auto updates just moves the risks elsewhere, it does not eliminate them.

Some people don’t want to think about updates. Auto updates make that effort go away. But really you just traded it for an even worse problem: instability. A stale but reliable system is far better than the latest and greatest unstable mess. The most important part of any software is that it is trustworthy and reliable. Break those two properties and there is barely any value left.

I get that just turning on auto-updates makes life simpler today. But we don’t build software systems just for today. We build them to run for years, decades, and soon possibly centuries. A tradeoff does not eliminate the problem, it just moves it. Sometimes that is okay, but sometimes that is just making everything worse.

Thursday, July 18, 2024

Expression

Often I use the term expressibility to mean the width of all possible permutations within some usage of a formal system. So state machines and programming languages have different expressibility. The limits of what you can do with them are different.

But there is another way to look at it.

You decide you want to build a solution. It fits a set of problems. It has an inherent complexity. Programmers visualize that in different ways.

When you go to code it for the computer, depending on the language, it may be more or less natural. That is, if you are going to code some complex mathematical equations, then a mathematics-oriented language like APL would be easier. In nearly the same way we express the math itself, we can write the code.

Although it is equivalent, if you express that same code with any imperative language, you will have to go through a lot more gyrations and transformations in order to fit those equations in the language. Some underlying libraries may help, but you still need to bend what you are thinking in order to fit it into the syntax.

Wherever and whenever we bend, there is a significantly increased likelihood of bugs. The bends tend to hide problems. You can’t just read it back and say “Yeah, that is actually what I was thinking.” The change of expression obscures that.

A long time ago, for large, but routine systems, I remember saying that the code should nearly match the specifications. If the user wrote a paragraph explaining what the code should do, the code that does the work should reflect that almost perfectly.

The variables are the user terminology; the structure is as they described. If it were tight, we could show the author of the spec the finished code and they would be able to mostly understand it and then verify that it is what they want. There would be some syntactic noise, some intermediate values, and some error handling as well, but the user would likely be able to see through all of that and know that it was correct and would match the spec.

That idea works well for specific complex calculations if they are tightly encapsulated in functions, but obviously, systems need a lot of other operational stuff around them to work. Still, the closer you get to that utopia, the more likely that visual inspections will bear fruit.

That doesn’t just affect quality but also enhances debugging and discussions. If someone has a question about how the system is working and you can answer that in a few seconds, it really helps.

Going the other way we can roughly talk about how far away the code drifts from the problem.

The programming language could be awkward and noisy. Expressing some complicated mathematics in assembler for instance would make it way harder to verify. All of the drift would have to be shoved into comments or almost no one could ever understand it.

Some languages require a lot of boilerplate, frameworks, and syntactic sugar, the expression there can bear little resemblance to the original problem.

Abstraction is another cause of drift. The code may solve a much more general problem, then need some specific configurations to scope it down to the actual problem at hand. So the expression is split into parts two.

The big value of abstraction is reuse. Code it once, get it working, and reuse it again for dozens of similar problems, it is a huge time saver, but a little more complex expression.

Expression in this sense isn’t that different from writing. You can say something in plain simple terms or you can hide your message in doublespeak. You might still be communicating the same things, but just making the listener's job a whole lot more difficult.

In the midst of a critical production bug, it really sucks if the expression of the code is disconnected from the behavior. At the extreme, it is spaghetti code. The twists and turns feel nearly random. Oddly, the worse the code expression, the more likely that there will be critical production bugs.

Good code doesn’t run into this issue very often, bad code hits it all of the time. Battletested abstract code is good unless there is a problem, but these are also rare. If you are fixing legacy code, most of what you will encounter will be bad code. The good or well-tested stuff is mostly invisible.

Thursday, July 11, 2024

Effective Management

What I think management theorists keep forgetting is that the rubber needs to hit the road. That is, management is a secondary occupation. It exists to make sure something actually gets done well enough, underneath.

Time is the critical component.

There are things you can do right now to alleviate an issue, which some might believe are successful. But if that short-term fix ends up making the overall problems worse later, it was not successful. It just took some time for the lagging consequences to play out.

We, as a species, seem really bad at understanding this. We clamor for fast fixes, then whine later that they were inappropriate. It would make more sense for us to accept that not all consequences are immediate and that we can trace bad things to a much earlier decision. And more importantly, we can learn from this and not keep repeating the same mistakes over and over again. We can get smarter, we can approach getting things done intelligently.

We like hierarchies to run stuff, but we seem to be foggy about their construction. People at different levels get confused about their role in making things happen. You get dictatorial leaders who make outrageous claims, and disabled workers that are entirely disconnected from what they are doing. The circumstances spiral out of control; unfortunate things happen.

At the bottom are the people doing the work.

To scale up an endeavor, they need to be placed in little controlled boxes. They cannot and should not be free to do whatever they choose. They have a job to do, they need to do their job.

But at the same time, if the boxes are so small that they can hardly breathe that is really bad too. They need to have enough freedom that they feel comfortable with what they are doing and can concentrate on doing their best at it. They need to be able to assess the value of what they are doing and participate in deciding if it is a worthwhile activity.

A good workforce will do good works regardless of their management. They will find the better ways to get things done. If they are enabled, the quality of their work will be better. Bad management can steal credit for a good workforce, it happens all of the time.

Good management understands that their job is to make sure the environment exists in order to build up a good workforce. They set the stage, the tone. They can help, but they are not the workforce, they are not in as much control of the situation as they want to believe. They didn’t do something, they set the stage for that thing to get done. It is important, but it is not the main thing.

As you go up the hierarchy, the concerns of management should be time and direction.

The lower managers need to be mostly concerned about tactical issues. They need to make sure the obvious obstacles are not preventing the workforce from accomplishing their effort.

Farther up the hierarchy the concerns should be longer-term. At the very top, the primary concern is strategy. Someone running a company should be obsessed with at least the next five years, if not far longer.

It’s the higher executives that should clue into the negative long-term consequences. They might realize that some tactical decision to get around a problem will hurt them somewhere down the road. They should be the ones who understand enough of the landscape to find a better path forward. They might call for opinions, but they are supposed to be in the position of evaluating all of them correctly. Ultimately direction is their decision, their responsibility.

If a company accidentally cannibalizes its own market with a less effective product, it is a strategic mistake. It may take a long time to play out but it is still a black mark against the leaders at the top. They should not have pointed the company to the edge of a cliff.

If a company lowers quality too far and eats through the value they had built up earlier, that is also a strategic mistake. They should have known exactly what the minimum quality was, and they should be ensuring that the effort does not fall below that bar. They should be able to see that a series of tactical choices doesn’t add up correctly and requires some interference in order to keep it from getting worse. If they don’t act, that is still a decision that they made or should have made. They are still at fault.

If a company blindly trudges forward while the ground beneath them erodes and disappears that is another common strategic mistake. Things were going so well that the upper executives stopped paying attention. They grossly overestimated the lifespan of their offerings. They should have noticed that the landscape changed and they need to change direction now too. They were asleep at the wheel. It is their responsibility to find and follow the road, even as it twists and turns through the forest.

So we put it all together and we can see that we have workers that are comfortable at their jobs. They are doing their best and occasionally raising concerns. Their tactical management jumps in to resolve their blockers quickly. Sometimes someone even higher jumps in later to reset the direction slightly. Overall though the workers are clear about what they have to do, they do it well enough, and they have confidence that they are headed in the same direction as their colleagues. Drama is rare, and while there are always frustrations, they have some confidence that things will get better, albeit it may be slow progress.

Contrast that to an organization out of control. The highest executives are stomping around making outrageous claims. These don’t match what is happening on the ground. People are mostly lost and frustrated. A few are pulling most of the weight and are exhausted. The lower management just exists to beat and whip the troupes. Badness rolls downhill. Conflict is encouraged. There is often a stagnant bureaucracy that exists as nothing but roadblocks. Bad problems don’t ever get fixed, they just fester while people try to steer around them. Navigating the insanity is harder than actually doing the work. Most of what actually gets done is useless; contributes nothing to reaching the goals. Success is rare. Overall whatever value was created in the past is slowly decaying. It’s enough to keep the endeavor funded but poor management is eating away at it instead of contributing to it. The overall direction is downwards and it is obvious to most people involved.

The problem with analysis is often perspective. If you take only a few viewpoints in a large organization, your understanding of the issues will be skewed. You have to step back, look at all of it, in all its ugly details, then objectively address the problems. Realistically, no one will be happy, they will always prefer their own agenda, that their perspective dominates. But what we have trouble understanding is that the best we can do collectively is not the best we can do individually. Management is not optimizing individual careers, it is about making sure that a group of people can get a lot of work done as effectively as the circumstances allow. All that really matters in the end is the long term. Everything else is a distraction.

Thursday, July 4, 2024

AI

I was interested in neural nets over thirty years ago. A friend of mine was engrossed in them and taught me a lot. They were hot for a while, but then they faded away.

As the AI winter began thawing, I dipped in periodically to see how it was progressing.

The results were getting amazing. However, I tended to see these large language models as just dynamically crafting code to fit specific data. Sure the code is immensely complex, and some of the behaviors are surprising, but I didn’t feel like the technology had transcended the physical limitation of hardware.

Computers are stupid; software can look smart, but it never is. The utility of software comes from how we interpret what the computer remembers.

A few weeks ago I was listening to Prof. Geoffrey Hinton talk about his AI concerns. He had survived the winter in one of our local universities. I have stumbled across his work quite often.

You have to respect his knowledge, it is incredibly deep. But I was still dismissing his concerns. The output from these things is a mathematical game, it may appear intelligent, but it can’t be.

As his words sank deeper I started thinking back to some of Douglas Hofstadter’s work. Godel, Escher, Bach is a magnum opus, but I read some of his later writings where he delved into epiphenomenon. I think it was I Am a Strange Loop, where he was making an argument that people live on in other’s memories.

I didn’t buy that as a valid argument. Interesting, sure, but not valid. Memories are static, what we know of intelligent life forms is that they are always dynamic. They can and do change. They adjust to the world around them, that is the essence of life. Still, I thought that the higher concept of epiphenomenon itself is interesting.

All life, as far as I know, is cellular. Roger Penrose in The Emperor's New Mind tried to make the argument that our intelligence and consciousness on top of our bodies sprang from the exact sort of quantum effects that Einstein so hated. Long ago I toyed with the idea that that probabilistic undertone was spacetime, as an object, getting built. I never published that work, early readers dismissed it quite rapidly, but that sense that the future wasn’t written yet stayed with me. That it all somehow plays back into our self-determination and free will as Penrose was suggesting. Again, another interesting perspective.

And the questions remained. If we are composed of tiny biological machines, how is it possible that we believe we are something else entirely on top of this? Maybe Hofstadter’s epiphenomenon really are independent from their foundations? Are we entities in our own right, or are we just clumps of quadrillions of cells? A Short History of Nearly Everything by Bill Bryson helps to muddle that notion even further.

Does it roll back to Kurt Godel’s first incompleteness theorem, that there are things -- that are true-- that are entirely unreachable from the lower mechanics? I’ll call them emergent properties. They seem to spring out of nowhere, yet they are provably true.

If we searched, would we find that there was some surprising formula that dictates the construction of sequential huge prime numbers, starting at a massive one, and continuing for a giant range, yet except for actually calculating it all out and examining it, we’d be totally unaware of the formula's existence. Nothing about the construction of primes themselves would lead us to deduce this formula. It seems to be disconnected. Properties just seem to emerge.

Godel did that proof for formal systems, which we are not, but we have become the masters of expressing the informal relationships that we see in our world with formal systems, so the linkages between the two are far tighter than we understand right now.

That argument that our sense of self is an epiphenomenon that is extraordinarily complex and springs to “life” on top of a somewhat less than formal biological system that is in the middle of writing itself out is super fascinating. It all sorts of ties itself together.

And then it scared me. If Hinton is correct then an AI answering questions through statistical tricks and dynamic code is just the type of complex foundation on which we could see something else emerge.

It may just be a few properties short of a serious problem right now. But possibly worse because humans tend to randomly toss things into it at a foolish rate. A boiling cauldron of trouble.

We might just be at that moment of singularity, and we might just stumble across the threshold accidentally. Some programmer somewhere thinks one little feature is cool, and that is just enough extra complexity for a dangerous new property to emerge, surprising everyone. Oops.

That a stupid computer can generate brand new text that is mostly correct and sounds nearly legitimate is astounding. While it is still derived and bounded by a sea of input I still don’t think it has crossed the line yet. But I am starting to suspect that it is too close for comfort now. That if I focused really hard on it, I could give it a shove to the other side, and what’s worse is that I am nowhere close to being the brightest bulb in the amusement park. What’s to keep someone brilliant near genius from just waking up one night and setting it all off, blind to the negative consequences of their own success?

After the AI winter, I just assumed this latest sideshow was another fad that would fade away when everyone gets bored. It will unless it unleashes something else. 

I did enjoy the trilogy Wake, Watch, Wonder by Robert J Sawyer, but I suspect that the odds of a benevolent AI are pretty low. I'd say we have to slow way, way, down, but that wouldn’t stop progress from the dark corners of the world. 

If I had a suggestion it would be to turn directly into opening Pandora's box, but to do it in a very contained way. A tightly sandboxed testnet that was locked down fully. Extra fully. Nothing but sneakernet access. A billion-dollar self-contained simulation of the Internet, with an instantaneous kill switch, and an uncrossable physical moat between it and the real world. Only there would I feel comfortable deliberately trying out ideas to see if we are now in trouble or not.