Friday, August 1, 2025

Encapsulation vs Fragmentation, Again

Long ago, I noticed a trend. Coming out of the eighties, people had been taking deep abstractions and encapsulating them into very powerful computational engines. That approach gave rise to formalized variations like data structures, object-oriented programming, etc.

But as the abstractions grew more sophisticated, there was a backlash. The industry was exploding in size, and with more new people, a lot of programmers wanted things to be simpler and more independent. Leveraging abstractions requires learning and thinking, but that slows down programming.

So we started to see this turn towards fragmented technologies. Instead of putting your smarts all in one place, you would just scattershot the logic everywhere. Which, at least initially, was faster.

If you step back a bit, it is really about individual programmers. Do you want to slowly build on all of these deep, complicated technologies, or just chuck out crude stuff and claim success? Personal computers, the web, and mobile all strove for decentralization, which you leveraged with lots of tiny fragments. Then you only had to come up with a clever new fragment, and you were happy.

Ultimately, it is an organizing problem. A few fragments are fine, but once there are too many, the complexity has been so amplified by the sheer number of them that it is unmanageable. Doomed.

Once you have too many, you’ll never get it stable; you fix one fragment, and it breaks a couple of others. If you keep that up, eventually you cycle all the way back around again and start unfixing your earlier fixes. This is pretty much guaranteed at scale, because the twisted interconnections between all of the implicit contextual dependencies are a massive Gordian knot.

Get enough fragments, and it is over. Every time, guaranteed.

Oddly, the industry keeps heading directly into fragmentation, promoting it as the perfect solution, then watching it slowly blow up. After which it will admit there was a problem, switch to some other new fragmented potential, and do it all over again. And again.

I guess microservices have become a rather recent example. 

We tried something similar in the early '90s, but it did not end well. A little past the turn of the century, that weed sprang up again.

People started running around saying that monoliths are bad. Which isn’t why true, all of your pieces are together in one central place, which is good, but the cost of that is limits on how grand you can scale them.

The problem isn’t centralization itself, but rather that scaling is and never will be infinite. The design for any piece of software constrains it to run well within just a particular range of scale. It’s essentially a mechanical problem dictated by the physics of our universe.

Still, a movement spawned off that insisted that with microservices, you could achieve infinite scaling. And it was popular with programmers because they could build tiny things and throw them into this giant pot without having to coordinate their work with others. Suddenly, microservices are everywhere, and if you weren't doing them, you were doing it wrong. The fragmentation party is in full swing.

There was an old argument on the operating system side between monolithic kernels and microkernels. Strangely, most of the industry went with one big messy thing, but ironically, the difference was about encapsulation, not fragmentation. So what we ended up with was one big puddle of grossly fragmented modules, libraries, and binaries that we called a monolith, since that was on top. Instead of a more abstracted and encapsulated architecture that imposed tighter organizational constraints on the pieces below.

So it was weird that we abused the terminology to hide fragmentation, then countered a bit later with a fully fragmented ‘micro’ services approach with the opposite name. Software really is an inherently crazy industry if you watch it long enough.

These days, there seems to be a microservices backlash, which isn’t surprising given that it is possibly the worst thing you can do if you are intentionally building a medium-sized system. Most systems are medium-sized. 

Whenever you try to simplify anything by throwing away any sort of organizing constraints, it does not end well. A ball of disorganized code, data, or configs is a dead man walking. Even if it sort of works today, it’s pretty much doomed long before it pays for itself. It is a waste of time, resources, and effort.

All in all, though, the issue is just about the pieces. If they are all together in one place, it is better. If they are together and wrapped up nicely with a bow, it is even better still.

If they are strewn everywhere, it is a mess, and what is always true about a mess is that if it keeps growing, it will eventually become so laborious to reverse its inherent badness that starting over again is a much better (though still bad) choice. 

The right answer is to not make a mess in the first place, even if that is slower and involves coordinating your work with a lot of other people.

The best answer is still to get it all into reusable, composible pieces so that you can leverage it to solve larger and larger problems quickly and reliably. That has been and will always be the most efficient way forward. When we encapsulate, we contain the complexity. When we fragment, it acts as a complexity multiplier. Serious software isn’t about writing code; it is about controlling complexity. That has not changed in decades, even though people prefer to pretend that it has.

Friday, July 25, 2025

Determinism

Having been around for a long time, I often realize that when I use terms like ‘determinism’, I have a slightly different, somewhat deeper sense of its meaning.

In general, something is deterministic if, no matter how often you do it, the results are always the same. Not similar, or close, but actually the same.

Computers are interesting beasts. They combine the abstract formalism of mathematics with a strong footprint in reality, as physical machines. Determinism is an abstract concept. You do something and 100% of the time, the results are the same. That we pile on massive amounts of instructions on top of these formal systems and interpret them with respect to our humanity does not change the notion of determinism. What does mess with it a bit is that footprint in reality.

Hardware is physical and subject to the informal whims of the world around us. So, sometimes it fails.

Within software, though, we effectively disconnect ourselves from that binding to reality. We ignore it. So, we do say that an algorithm is deterministic, in the abstract sense, even if it is running on hardware that effectively injects some nondeterminism into the mix. I could probably go on forever about that touchpoint, but given that we choose to ignore it, that is all that really matters.

So, in that sense, without respect to reality, we can say that an algorithm is deterministic. Given the same inputs, you will always get the same outputs, every time. More importantly, a mandatory property of something actually being an algorithm is determinism. We do have a term for sets of instructions that do not absolutely work reliably, really just best efforts, we call them heuristics. A heuristic will do its best to get an answer, but for any number of reasons, it will not be 100%. It may be 99.9999%, but that .0001% failure rate, when done often enough, is actually significant.

All of this is more important than just being a theoretical discussion. What we need from and what people expect from software is determinism. They need software they can rely on, each and every time they go to use it. It is the core unstated requirement of basically every piece of software out there, with the exception of code that we know is theoretically close to being impossible. A heuristic would never do when an algorithm exists.

The classic example of this is hiding in plain sight. A graphical user interface is a ‘pretty’ means of interacting with computers. You do something like press a button on-screen, and that triggers one or more computers to do some work for you. That’s nice.

You press the button, and the work gets done. The work itself should be deterministic. So, each time you press the button, the results are the same.

No doubt people have seen plenty of interfaces where this is not true. In the early days of the web, for example, we had a lot of issues with ‘double clicks’ until we started building in double click protection to ignore the second click if an earlier one was in play. We did that to avoid burning resources, but we also did it to restore some determinism to the interface. People would get annoyed if, for example, they accidentally double-clicked and that caused the software to break or do weird things. It would ‘bug’ them, but really, what it did was violate their expectations that their interaction with the interface was deterministic, which is key.

So, a single click can and should be deterministic, but what about a series of them?

One of the bad habits of modern programmers is that they push too much of their workload into GUIs. They think because there is an interface where they can click on everything they need, and that each click is in itself deterministic, that it is a good way of getting tasks done. The problem is not the buttons, but what lies between them.

If you always have to click 3 buttons to get a specific result, it is probably fine. But once that grows in size to 10 buttons, or 50 buttons, or, as it seems in some cases, 100 buttons, the determinism fails rather dramatically. It’s not the software, though; it is the person in between. We are heuristic. Experts strive to be deterministic, but we are battling against our very nature to be absolutely precise absolutely every time. And that plays out, as one might expect, in long button sequences. Sometimes you hit the 100 in the right order, as desired, but sometimes you don’t. Maybe you hit 99 of them, or in the middle, the order is slightly different. It doesn’t matter in that we know that people are not deterministic, and we can absolutely depend on that being the case,

If you wired up one button to hit the other 100, then you are back to being deterministic again, but if you don’t do that, then using the GUI for any non-trivial task is non-deterministic, simply because people are non-deterministic.

This is exactly why so many old and experienced programmers keep trying to get people to script stuff instead. If you have a script, and you give it the same inputs, then if it was written properly, when it runs, it will give you the exact same outputs, every time. And it is easy to write scripts with no arguments on top of scripts that have some variability to make it better.

If you were going to do a big release of complicated software, if the release process is a bunch of button clicks in a bunch of different apps, you would be asking for trouble. But if it was just one script called ‘release.sh’ in one place, with no arguments, then your release process would be fully, completely, and totally deterministic.

If there is some unwanted variability that you’ve injected into the process, then that acts as a particularly nasty bit of friction. First, it should scare you to do a release if there is a possibility that you might do it incorrectly. Second, when it is incorrect, the cleanup from having messed it up is often quite expensive. What happens then is that it might work a few times initially, but then people get tired and it goes wrong. Then they get scared, and it either slows everything down out of fear or it keeps going wrong, and it makes it all worse.

That then is why determinism is just so important to software developers. It might be easy to play with a GUI and do things, but you’ve given up determinism, which will eventually bite you in the hand, just when you can’t afford that type of mistake. It’s high risk and high friction. Both of which are now making it harder to get stuff done as needed.

It takes a lot longer to script everything, but once you are on your way, it gets easier and easier as you’ve built up the foundations for getting more and more stuff done. As you go, the scripts get battle-tested, so they rather naturally act as their own test harness. If you fix the scripts instead of avoiding them, you get to this point where tasks like releases are so easy and reliable that there is very little friction to getting them done. The only thing stopping you from doing it too frequently is whether or not they are needed right away. This is the root of ideas like CI/CD pipelines. You’ll have to release often, so it needs to be deterministic.

Determinism plays out in all sorts of other ways within software. And usually the lack of it triggers relatively small side effects that are too often ignored, but build up. If you look for it in the code, in the technologies, in the process, and everywhere else, you find that getting closer to or achieving it is drastically reducing friction, which is making the job better and far less painful.

So it’s more than just a type of state machine, the entropy of hardware, or the noise on a network. It is a fundamental necessity for most of the solutions we build.

Friday, July 18, 2025

Anything Goes Style

In anything goes style, you code whatever works. You do not question it; if the results appear to be more or less correct when you run it on your machine, you ship it.

Anything goes style is often paired with brute force style. So you get these mega functions of insanely mixed logic that are deeply nested, and the code often does all sorts of bizarre, wasteful, and disorganized things. Generally, it has more bugs, and they are rarely fixed correctly since the logic is convoluted and fragile.

Anything goes style also burns resources like they are free and is a primary driver of bloat. It uses way more memory than it needs, relentlessly beats the disk to no effect, and litters the network with countless useless packets.

Modern hardware hides it, but when you see a lot of it congregating together, it is obvious that it is spending too much time doing useless work. We often see large software packages growing faster on disk than their added features.

The style became more popular with languages like PHP and JavaScript, but it got an epic shot of adrenaline with containers. No longer was it obvious that the code was awful when you can just package up the whole development machine and ship that directly, in all its inherent ugliness.

Anything goes is often the coding style at the root of security failures. The code is so obfuscated it can’t be reviewed, and the containers are opaque. That it isn’t doing its work properly isn’t noticed until it is too late and has already been exploited. A variant is to wire up overly expressive dependencies for simple tasks but not lock them down properly, so the whole thing has more holes than Swiss cheese.

Some people argue that it’s a programmer’s job to toss out their work as quickly as possible. Why spend extra time making sure infrequent things like security breaches don’t happen? This has led to some epic failures and a growing frustration amongst computer users that software is ruining our world. It is the tragic opposite of engineering. Our job is not to create more software, but rather it is to solve people's problems with reliable software.

Other styles include:
https://theprogrammersparadox.blogspot.com/2025/06/brute-force-style.html
https://theprogrammersparadox.blogspot.com/2025/05/house-of-cards-style.html
https://theprogrammersparadox.blogspot.com/2023/04/waterloo-style.html

Friday, July 11, 2025

Assumptions

You’re asked to build a big system that solves a complex business domain problem.

But you don’t know anything about the business domain, or the actual process of handling it, and there are some gaping holes in your technology knowledge for the stack that you need to make it all work properly. What do you do?

Your biggest problem is far too many unknowns. Know unknowns and unknown unknowns. A big difficulty with software development is that we often solve this by diving in anyway, instead of addressing it proactively.

So we make a lot of assumptions. Tonnes of them.

We usually work with a vague understanding of the technologies. Either we ignore the business domain, or our understanding is so grossly over-simplified that it is dangerous. This is why there is so little empathy in our fragile creations.

It would be nice if this changed, but it does not, and has only gotten worse with time.

So instead, we need to deal with it.

First is to assume that almost everything is an assumption. Second is to insert enough flexibility into the work so that only minimal parts of it are lost if your assumptions are wrong.

For technical issues, on occasion, you can blindly guess correctly. More often, if you just follow the trends, for example, in a GUI, do whatever everybody else is doing, it's less likely to change. It’s a mixed bag, though, in that some super popular trends are actually really bad ideas, so it’s good to be a little sceptical. Hedge your bets and avoid things that are just too new and don’t have staying power.

But for business stuff, when it is as far away from what you imagine it to be, it is never easy. The obvious point is to go learn about how it actually works. Or get an expert and trust them fully. Often, that is not an option.

The other side is to be objective about it. Is it actually something that could be handled in multiple different ways? And how many possible variations in handling it can you imagine?

Valuations and pricing are good examples where people are usually very surprised at how different the actual reality is from what they might have guessed. Mostly because the most obvious ways of dealing with them are not practical, and a lot of history has flowed under the bridge already. If you have zero real exposure and you guess, it will most certainly be wrong.

The key is that if you do not know for certain, the code itself should not be static. That is, the code mirrors your own certainty of your own assumptions. Static if you are absolutely 1000% certain, dynamic if you are not.

If you think there might be ten ways to do something, then you implement the one you guessed is likely and make it polymorphic. As others pop up, it is easy to add them too. It takes a little more effort to make something encapsulated and polymorphic, but if you are right about being wrong, you just saved yourself some big trouble and a few bad days.

Flipping that around, scope creep isn’t often really scope creep, but more of assumption convergence. People assumed that a simple, trivial feature would do the trick, but at some point they were enlightened into realizing that was incorrect, so now the code has to do far more than they initially believed that it should. Knowledge was gained; the design and implementations should be updated to reflect that. What already exists should be properly refactored now.

In development projects where the coders don’t want to know anything about the underlying business problems, they get angry at the domain experts for not having known this sooner. In projects where the coders care about the outcomes, they are keen to resolve this properly. The difference is whether you see the job as churning specifications into code or as solving people's problems with code.

A while back, there was a lot of resistance to what was termed speculative generalization. If you could save yourself a few days by not making something encapsulated or polymorphic, it was argued that you should save those days. The problem was that when paired with a lack of caring about what the code was supposed to do, stubbornness in insisting that nothing should change just generated a lot of drama. And that drama and all of the communication around it eats up a tremendous amount of time. The politics flows fast and furious, so it drains the life out of everything else in the project. Everybody’s miserable, and it has eaten far more time than if you just made the change. People used to blame this on the waterfall process, but it is just as ugly and messy in lightweight methodologies.

With that in mind, a little extra time to avoid that larger and more difficult path is a lot of time saved. Just that you should not really forecast where the code base will grow, but instead just work to hedge your own lack of certainty.

It’s a shifting goal, though. As you build more similar things, you learn more and assume less. You know what can be static and what should likely be dynamic. Any other developer will disagree with you, since their experiences and knowledge are different. That makes it hard to get all of the developers on the same page, but development goes way smoother if they are all on the same page and can interchange and back each other up. That is why small, highly synced “tiger” teams can outcode much bigger projects.

It can be hard when something is counterintuitive to convince others that it is what it is. That is a trust and communication issue between the developers themselves. Their collective certainty changes the way they need to code. If it's mandated above or externally, it usually assumes total uncertainty, and so everything is dynamic and thus overengineered. That worst-case scenario is why it aggravates people.

The key, though, is always being objective about what you actually know for certain. If you can step back and not take it personally, you can make good choices in how to hedge your implementation and thus avoid all sorts of negative outcomes. If you get it nearly right, and you’ve focused on readability, defensive coding, and all the other proactive techniques, then releasing the code will be as smooth as you can get it.

If you go the other way and churn a ball of mud, the explosion from releasing it will be bigger than the work to create it. Just as you think it is going out the door and is over, it all blows up in your face, which is not pleasant. They’ll eventually forgive you for being late if it was smooth, but most other negative variations are often fatal.

Thus, the adage “assumptions kill”, but in an industry that is built around and addicted to assumptions, you are already dead, you just don’t know it yet.

Friday, July 4, 2025

Industrial Strength

Software that continues to correctly solve a problem, no matter what chaos surrounds it, is industrial strength. It just works, it always works.

It is about reliability and expectations. The code is there when you need it and it behaves in exactly the way you knew it would.

You can be certain if you build on top of it, that it won’t let you down. There will always be problems, but the industrial-strength stuff is rarely one of them.

If code isn’t industrial strength it is a toy. It may do something cute, it may be clever. The results could be fascinating and highly entertaining. But it is still a toy.

You don’t want to build something serious on top of toys. They break when least expected, and they’ll do strange things periodically. They inject instability into whatever you are doing and whatever you have built on top.

Only toys can be built on other toys; they’ll never be industrial strength. Something fragile that breaks often can’t be counterbalanced properly. Contained perhaps, but the toy nature remains.

Lots of people market toys as if they were industrial strength. They are not, and highly unlikely to ever be. Toys don’t “mature” into industrial strength, they just get more flaky as time goes on.

Industrial strength is a quality that has to be baked into every facet of the design right from day one. It is not accidental. You don’t discover it or blunder into it. You take the state-of-the-art deep knowledge, careful requirements, and desired properties, then you make difficult tradeoffs to balance out the competing concerns. Industrial strength is always intentional and only ever by people who really understand what it means.

There is nothing wrong with toy software, it can be instructive, fun, or entertaining. But for the things that we really depend on, we absolutely do not want toy software involved. Some problems in life are serious and need to be treated seriously.