Thursday, November 27, 2025

Software Failures

Way back, maybe 15ish years ago, when I was writing that software projects failed just as often then as they did back in the Waterfall era, lots and lots of people said I was wrong.

They insisted that the newer lightweight reactive methodologies had “fixed” all of the issues. But if you understand what was going wrong with these projects, you’d know that was impossible.

So it’s nice to see a modern perspective confirm what I said then:

https://spectrum.ieee.org/it-management-software-failures

The only part of that article that I could disagree with is that it was a bit too positive towards Agile and DevOps. It did mitigate itself at the end of the paragraph, but it still has an overly positive marketing vibe to the writing. “Proved successfully” should have been lower to “claimed successfully”, which is a bit different and a lot more realistic.

If you lumped in all of the software that doesn’t even pay for itself, you’d see that the situation is much worse in most enterprises. Millions of lines of fragile code that kinda do just enough that it convolutes everything else. It’s pretty ugly out there. A big digital data blender.

From my perspective, the chief problem of our industry is expectations. When non-technical people moved in to take control of development projects, they misprioritized things so badly that the work commonly spins out of control.

If a development project is out of control, the best case is that it will produce a lame system; the worst case is that it will be a total outright failure. It is hard to come back from either consequence.

If we want to fix this, we have to change the way we are approaching the work.

First, we have to accept that software development is extremely, extremely slow. There are no silver bullets, no shortcuts, no vibing, no easy or cheap ways out. It is a lot of work, it is tedious work, and it needs to be done carefully and slowly.

Over the 35 years of my career, it just keeps getting faster, but with every jump up, the quality keeps getting worse. Since you need a minimal level of quality for it to not be lame or a failure, you need a minimal amount of time to get there. You try to skimp on that, it falls apart.

Hacking might be a performance art, but programming is never that. It is a slow, intense slog bordering on engineering. It takes time.

Time to design, time to ramp up, time to train, time to learn, time to code, time to test. Time, time, time.

If the problem is that you are trying to race through the work in order to keep the budget under control, the problem is that you are racing through the work. So, slow it down. Simple fix.

For any serious system, it takes years and years for it to reach maturity. Trying to slam part of it out in six months, then, is more than a bit crazy. Libraries and frameworks don’t save you. SAAS products don’t save you. Being overly reactive and loose with lightweight methodologies doesn’t save you either, and then can actually fuel the problems, making it worse, not better.

If you want your software to work properly, you have to put in the effort to make it work properly. That takes time.

The other big issue that is needed is that the group of people you assembled to build a big system matters a whole lot. Huge amount.

Programmers are not easily replaceable cogs. An all-junior team that is barely functional is not only far less expensive, it is also a massive risk. The resulting system is already in big trouble before any code is written.

The people you put in charge of the development work really matter. They need to be skilled, experienced, and understand how to navigate some pretty complicated and difficult tradeoffs. Without that type of background, the work gets lost and then spins out of control.

It’s very common to see too much focus dumped on trivial visible interface issues while the underlying mechanics are hopelessly broken. It’s like worrying about the cup holder in your car when the engine block is cracked. You need someone who knows this, has lived this, and can avoid this.

As well, enough of the team needs to have significant experience too. Experience is what keeps us from making a mess, and a mess is the easiest thing to create with software. So, a gifted team of developers, mixed in experience with both juniors and seniors, led by experience, is pretty much a prerequisite to keeping the risks under control.

Software development has always been about people, what they know, and what they can build. They are the most important resource in building and running large software systems. If you don’t have enough skilled people, nothing else matters. No methodology, process, or paperwork can save you. You lack the talent to get it done. Simple.

That’s mostly the roots of our problems. Not enough time, and not taking the staffing issues seriously enough. Fix those two, and most development gets back on the rails. Nurture them, and most things built out of a strong shop are pretty good. From there, you can decide how high the quality should be, or how to streamline the work, or strategize about direction, but without that concrete foundation, you are lucky if any of it runs at all, and if it does that the crashes just aren’t too epic.

Thursday, November 20, 2025

Integrations

There are two primary ways to integrate independent software components, we’ll call them ‘on-top’ and ‘underneath’.

On top means that the output from the first piece of software goes in from above to trigger the functionality of the second piece of software.

This works really well if there is a generalized third piece of software that acts as a medium.

This is the strength of the Unix philosophy. There is a ‘shell’ on top, which is used to call a lot of smaller, well-refined commands underneath. The output of any one command goes up to the shell, and then down again to any other command. The format is unstructured or at least semi-structured text. Each command CLI takes its input from stdin and command line arguments, then puts its output to stdout, and splits off any errors into stderr. The ‘integration’ between these commands is ‘on top’ of the CLI in the shell. They can all pipe data to each other.

This proved to be an extremely powerful and relatively consistent way of integrating all of these small parts together in a flexible way.

Underneath integrations are the opposite. The first piece of software keeps its own configuration data for the second piece and calls it directly. There may be no third party, although some implementations of this ironically spin up a shell underneath, which then spins up the other command. Sockets are also commonly used to communicate, but they depend on the second command already being up and running and listening on the necessary port, so they are less deterministic.

A lot of modern software prefers the second type of integration, mostly because it is easier for programmers to implement it. They just keep an arbitrary collection of data in the configuration, and then start or call the other software with that configuration.

The problem is that even if the configuration itself is flexible, this is still a ‘hardwired’ integration. The first software must include enough specific code in order to call the second one. The second one might have a generic API or CLI. If it needs the output, the first software needs to parse the output it gets back.

If the interaction is bi-directional and a long-running protocol, this makes a lot of sense. Two programs can establish a connection, get agreement on the specifics, and then communicate back and forth as needed. The downside is that both programs need to be modified, and they need to stay in sync. Communication protocols can be a little tricky to write, but are very well understood.

But this makes a lot less sense if the first program just needs to occasionally trigger some ‘functionality’ in the second one. It’s a lot of work for an infrequent and often time-insensitive handoff. It is better to get the results out of the program and back up into some other medium, where it can be viewed and tracked.

The top-down approach is considerably more flexible, and depending on the third-party is far easier to diagnose problems. You can get a high-level log of the interaction, instead of having to stitch together parts of a bunch of scattered logs. Identifying where a problem originated in a bunch of underneath integrations is a real nightmare.

Messaging backbones act as third parties as well. If they are transaction-oriented and bi-directional, then they are a powerful medium for different software to integrate. They usually define a standard format for communication and data. Unfortunately, they are often vendor-specific, very expensive, locked in, and can have short lifespans.

On-top integrations can be a little more expensive when using resources. They are slower, use more CPU, and it is costly to format to a common format than to parse back to the specifics. So they are not preferred for large-scale high-performance systems. But they are better for low or infrequent interactions.

However, on-top integrations also require a lot more cognitive effort and pre-planning. You have to carefully craft the mechanics to fit well into the medium. You essentially need a ‘philosophy’, then a bunch of implementations. You don’t just randomly evolve your way into them.

Underneath integrations can be quite fragile. When there are a lot of them, they are heavily fragmented; the configurations are scattered all over the place. If there are more than 3 of them chained together, it can get quite hairy to set them up and keep them running. Without some intensive tracking, unnoticed tiny changes in one place can manifest later as larger mysterious issues. It is also quite a bit harder to reason about how the entire thing works, which causes unhappy surprises. Equally problematic is that each integration is very different, and all of these inconsistencies and different idioms increase the likelihood of bugs.

As an industry, we should generally prefer on-top integrations. They proved to be powerful and reliable for decades for Unix systems. It’s just that we need more effort in finding expressive generalized data passing mechanisms. Most of the existing data formats are far too optimized for limited sub-cases or are too awkward to implement correctly. There are hundreds of failed attempts. If we are going to continue to build tiny independent, distributed pieces, we have to work really hard to avoid fragmentation if we want them to be reliable. Otherwise, they are just complexity bombs waiting to go off.

We’ll still need underneath integrations too, but really only for bi-directional extensive, high-speed, or very specific communication. These should be the exception -- optimization -- rather than the rule. It is easier to implement, but it is also less effective and is a dangerous complexity multiplier.

Thursday, November 13, 2025

Unknown unknowns

If I decided to build a house all on my own, I am pretty sure I would face lots of unexpected problems.

I am comfortable building something like a fence or a deck, but those skills and the knowledge I gained using them are nowhere close to what it takes to build a house.

What does it take to build a house? I have no clue. I can look at already built houses, and I can watch videos of people doing some of the work, but that isn’t even close to enough information to empower me to just go off and do it on my own.

If I tried, I would surely mess up.

That might be fine if I were building a little shed out back to store gardening tools. It’s likely that whatever mess I created would probably not result in injuries to people. It’s a very slim possibility.

But knowing that there are a huge number of unknown unknowns out there, I would be more than foolish to start advertising myself as a house builder, and even sillier to take contracts to build houses.

If a building came tumbling down late one night, it could very likely kill its occupants. That is a lot of unnecessary death and mayhem.

Fortunately, for my part of the world, there are plenty of regulators out there with building codes that would prevent me from making such a dangerous mistake.

The building codes are usually specific in how to do things, but they were initially derived from real issues that explain why they are necessary.

If I were to carefully go through the codes, I am sure that their existence -- if I pondered hard enough-- would shed light on some of those unknown unknowns that I am missing.

There might be something specific about roof construction that was driven by roofs needing to withstand a crazy amount of weight from snow. The code mandates the fix, but the reason for seemingly going overboard on the tolerances could be inferred from the existence of the code itself. “Sometimes there is a lot of extra weight on roofs”.

Rain, wind, snow, earthquakes, tsunamis, etc. There are a series of low-frequency events that need to be factored into any construction. They don’t occur often, but the roof needs to survive if and when they manifest themselves.

Obviously, it took a long time and a lot of effort over decades, if not centuries, to build up these building codes. But their existence is important. In a sense, they separate out the novices from the experts.

If I tried to build a house without reading or understanding them, it would be obvious to anyone with a deeper understanding that I was just not paying attention to the right areas of work. The foundations are floppy or missing, the walls can’t hold up even a basic roof, and the roof will cave in under the lightest of loads. The nails are too thin; they’ll snap when the building is sheared. It would be endless, really, and since I don’t know how to build a house, I certainly don’t know all of the forces and situations that would cause my work to fail.

I’ve always thought that it was pretty obvious that software needs building codes as well.

I can’t count the number of times that I dipped into some existing software project only to find that problems that I find very obvious, given my experiences, were completely and totally ignored. And that, once the impending disasters manifested themselves, everybody around me just said “Hey, that is unexpected”, when it was totally expected. I’ve been around the block; I knew it was coming.

Worse is that whenever I tried to forewarn them, they usually didn’t want to listen. Treated me as some old paranoid dude, and went happily right over the cliff.

It gets so boring having to say “I told you so”, that at some point in my career, I just stopped doing it. I stuck with “you can lead a horse to water, but you can not make it drink” instead.

And that is where building codes for software come in. As a new developer in an existing project, I often don’t carry much weight, but if there was an official reference for building codes that covered the exact same thing, it would be easy to prevent. “You’ve violated code 4.3.2, it will cause a severe outage one day”, is better than me trying to explain why the novice blog posts they read that said it was a good idea are so horribly wrong.

Software development is choked with so many myths and inaccuracies that wherever you turn, you bump into something false, like trying to run quickly through a paper-maché maze without destroying it.

We kinda did this in the past with “best practices”, but it was informal and often got co-opted by dubious people with questionable agendas. I think we need to try again. This time, it is a bunch of “specific building codes” that are tightly versioned. They start by listing out strict ‘must’ rules, then maybe some situationally optional ones, and an appendix with the justifications.


It’s oddly very hard to write, and harder to keep it stack, vendor, and paradigm neutral. We should probably start by being very specific, then gradually consolidate those codes into broader, more general ones.

It would look kinda of like:

1.1.1 All variable names must be self-describing and synchronized with any relevant outside domain or technical terminology. They must not include any cryptic or encoded components.

1.2.1 All function names must be self-describing and must clearly indicate the intent and usability of the code that they encapsulate. They must not include any cryptic or encoded components, unless mandated by the language or usage paradigm.

That way, if you crossed a function called FooHandler12Ptr, you could easily just say it was an obvious violation of 1.2.1 and add it as a bug or a code review fix.

In the past, I have worked for a few organizations that tried to do this. Some were successful, some failed miserably. But I think that in all cases, there was too much personality and opinion buried in their efforts. So, the key part here is that each and every code is truly objective. Almost in a mathematical sense, they are all ‘obviously true’ and don’t need to be broken down any further.

I do know that, given the nature of humanity, there is at least one programmer out there in this wide world who currently believes that ‘FooHandler12Ptr’ isn’t just a good name, it should also be considered best practice. For each code, I think they need an appendix, and that is where the arguments and justifications should rest. It is for those people adventurous enough to want to pursue arguing against the rules. There are plenty of romanized opinions and variations on goals; our technical discussions quickly get lost in very non-objective rationales. That should be expected, and the remedy is for people with esoteric views to simply produce their own esoteric building codes. The more, the merrier.

Of course, if we do this, it will eat up some time, both to write up the codes but also to enforce them. The one ever-present truth of most programming is that there isn’t even close to enough time to spare, and most managements are chronically impatient. So, we sell adherence to the codes as a ‘plus’, partially for commercial products or services. “We are ‘XXX 3.2.1 compliant’ as a means of really asserting that the software is actually good enough for its intended usage. In an age where most software isn’t good enough, at some point, this will become a competitive advantage and a bit later a necessity. Just need a few products to go there first, and the rest will have to follow.

Thursday, November 6, 2025

Intent

Recently, I was reading some AI-generated code.

At the high level, it looked like what I would expect for the work that it was trying to do, but once I dug into the details, it was kinda of bizarre.

It was code that one might expect from a novice programmer who was struggling with programming. Its intent was muddled. Its author was clearly confused.

It does help, though, for a discussion of readability.

Really good code just shows you what it is going to do, really easily. It makes it obvious. You don’t need to think too hard or work through the specifics. The code says it is going to do X, and the code does that X, as you would expect. Straightforward, and totally boring, as all good code should be.

In thinking about that, it all comes down to intentions. “What did the programmer intend the code to do?” Is it some application code that moves data from the database and back to the GUI again? Does it calculate some domain metric? Is it trying to span some computation over a large number of resources?

To make it readable, the programmer has to make their intent clear.

The most obvious thing is that if there is a function called GetDataFromFile, which gets data from the database, you know the stated intent is wrong, or obscured, or messed up. Shouldn’t the data come from a file? Why is it going to a database underneath? Did they set it up one way and duct-taped it later, without bothering to properly update the name?

If the code is lying about what it intends to do, it is not readable. That’s an easy point, in that if you have to expend some cognitive effort to remember all of the places where the code is lying to you, that is just unnecessary friction getting in your way. Get enough misdirection in the code, and it is totally useless, even if it compiles. Classic spaghetti.

Programming is construction, but it is also, unfortunately, a performance art. It isn’t enough to just get the code in place; you also have to keep it going, release after release.

Intent also makes it clear for the single responsibility issues.

If the intent of a single function is to “get data from the db”, “... AND to reformat it”, “... AND to check it for bad data”, “... AND to ....” then it’s clear that the function is not doing just one thing, it is doing a bunch of them.

You’d need a higher function that calls all of the steps: “get”, “reformat”, “validate”, etc., as functions. Its name would indicate that it is both grabbing the data and applying some cleanup and/or verification to it. Raw work and derived work are very different beasts.

Programmers hate layering these days, but muddying in a bunch of different things into one giant function not only increases the likelihood of bugs, but also makes it hard for anyone else to understand what is happening. Nobody ever means to write DoHalfOfTheWorkInSomeBizarreInterlacedOrder, but that should really be a far more common function name in a lot of codebases out there. The intent of the coder was to avoid typing in functions to avoid having to name them. What the code itself was doing was forgotten.

If you decompose the code nicely into decent bite-sized chunks and give each chunk a rational, descriptive name, it is pretty easy to follow what the code is trying to do. If it is layered, then you only need to descend into the depths if there is an underlying bug or problem. You can quickly assert that GenerateProgressReport does the 5 steps that you’d expect, and move on. That is readable, easy to understand, and you probably don’t need to go any deeper. You now know those 5 steps are. You need that capability in huge systems; there is often more code there than you can read in a lifetime. If you always have to see it with all of the high and low steps intertwined together, it cripples your ability to write complex or correct code.

In OO languages, you can get it even nicer: progress.report.generate() is nearly self-documenting. The nouns are stacked in the way you’d expect them, even though “progress” is really a nounified verb. If the system were running overnight batches, and the users occasionally wanted to check in on how it was going, that is where you’d expect to find the steps involved. So, if there was a glitch in the progress report, you pretty much know where that has to be located.

A long, long time ago, in the Precambrian days of OO, I remember watching one extremely gifted coder in action. He was working with extremely complicated graphic visualizations. As he’d see a problem on the screen while testing, he had structured his code so well that he pretty much knew exactly which line in it was wrong. That kind of code is super readable, and the readability has an extra property making it highly debuggable as well. The bug nicely tells you exactly where in the code you have a problem. That is a very desirable property.

His intent was to render this complicated stuff; his code was coded in a way to make it easy to know if it was doing the right thing or not. This let him quickly move forward with in his work. If his code had been badly named spaghetti, it would have taken several lifetimes to knock out the bugs. For anybody who does not think those readability and debugability properties are necessary, they don’t realize how much more work they’ve turned it into, how much time they are wasting.

If the intent of the code is obscured or muddled, it limits the value of the code. That’s why we have comments, for adding in extra commentary that the code itself cannot express. Run-once code, even if it works, is too expensive in time to ever allow it to pay for itself. You don’t want to keep writing nearly similar pieces of code each time; if you can just solve it once in a slightly generalized fashion, and then move on to other, larger pieces of work.

It takes a bit of skill to let intent shine through properly. It isn’t something intuitive. The more code from other people you read, the more you learn which mistakes hurt the readability. It’s not always obvious, and it certainly changes a bit as you get more and more experience.

You might, for example, put in a clear idiom that you have seen in the past often, but it could confuse less experienced readers. That implies that you have to stick to the idioms and conventions that match the type of code you are writing. Simple application idioms for the application’s code, and more intricate system programming idioms for the complex or low-level stuff. If you shove some obscure functional programming idiom into some basic application code, it will be hard for other people to read. There is a more ‘application coding’ way of expressing the code.

It is a lot like writing. You have to know who you are coding for and consider your audience. That gives you the most readable code for the work you are doing. It is necessary these days because most reasonably sized coding projects are a team sport now.

It’s worth noting that deliberately hiding your intent and the functionality of the code in an effort to get ‘job security’ is generally and most often unethical. Maybe if it’s some code that you and a small group of people will absolutely maintain for its entire lifetime, then it might make sense, but that is also never actually the case. More often, we write stuff, then move on to writing other stuff.

Friday, October 31, 2025

The Structure of Data

A single point of data -- one ‘variable’ -- isn’t really valuable.

If you have the same variable vibrating over time, then it might give you an indication of its future behavior. We like to call these ‘timeseries’.

You can clump together a bunch of similar variables into a ‘composite variable’. You can mix and match the types; it makes a nexus point within a bunch of arbitrary dimensions.

If you have a group of different types of variables, such as some identifying traits about a specific person, then you can zero in on uniquely identifying one instance of that group and track it over time. You have a ‘key’ to follow it, you know where it has been. You can have multiple different types of keys for the same thing, so long as they are not ambiguous.

You might want to build up a ‘set’ of different things that you are following. There is no real way to order them, but you’d like them to stay together, always. The more data you can bring together, the more value you have collected.

If you can differentiate for at least one given dimension, you can keep them all in an ordered ‘list’. Then you can pick out the first or last ones over the others.

Sometimes things pile up, with one thing about a few others. A few layers of that and we get a ‘tree’. It tends to be how we arrange ourselves socially, but it also works for breaking down categories into subcategories or combining them back up again.

Once in a while, it is a messy tree. The underlying subcategories don’t fit uniquely in one place. That is a ‘directed acrylic graph’ (dag) which also tends back to some optimizing forms of memoization.

When there is no hierarchical order to the whole thing it is just a ‘graph’. It’s a great way to collect things, but the flexibility means it can be dangerously expensive sometimes.

You can impose some flow, making the binary edges into directional ones. It’s a form of embedding traits into the structure itself.

But the limits of a single-dimensional edge may be too imposing, so you could allow edges that connect more than one entry, which is called a ‘hypergraph’. These are rare, but very powerful.

We sometimes use the term ‘entity’ to refer to our main composite variables. They relate to each other within the confines of these other structures, although we look at them slightly differently in terms of, say, 1-to-N relationships, where both sides are effectively wrapped in sets or lists. It forms some expressive composite structures.

You can loosely or tightly structure data as you collect it. Loose works if you are unsure about what you are collecting, it is flexible, but costly.. Tight tends to be considerably more defensive, less bugs, and better error handling.

It’s important not to collect garbage; it has no inherent value, and it causes painful ‘noise’ that makes it harder to understand the real data.

The first thing to do when writing any code is to figure out all of the entities needed and to make sure their structures are well understood. Know your data, or suffer greatly from the code spiraling out of control. Structure tends to get frozen far too quickly; just trying to duct tape over mistakes leads to massive friction and considerable wasted effort. If you misunderstood the structure, admit it and fix the structure first, then the code on top.

Monday, October 27, 2025

Fishbowl

I’ve often felt that software development projects were representative fishbowls for the rest of reality.

They have the same toxic mix that we see everywhere else, just on a smaller scale. It’s technology mixed with people mixed with business mixed with time.

Because of its turbulent history, technology is an ever-growing mess. It kinda works, but it's ugly and prone to glitches. We’ve spent decades desperately trying to avoid applying any serious engineering to our efforts.

People all have their own personal agendas that they often prioritize over the success of the collective work.

Some executives would prefer a low-quality early release so they can quickly claim success and move on. Software developers often pick inappropriate technologies to check off boxes on their resumes. All sorts of other players poke their fingers into the pie, hoping to make their careers.

People do what they think is best for themselves, which can be negative overall.

Meanwhile, they are circled by sales and business sharks hoping for a quick buck. They’ll promise anything if it will get their foot in the door. Lots of money is at stake. The software industry is slimy.

As the clock is ticking, too often the domain and engineering aspects fall to the ground. People stop caring about building solid tech that solves the user’s problem; they are more focused on their own issues.

This chaos devolves into producing a mess. Stuff tossed together too quickly for all the wrong reasons. The codebase turns to mud. It becomes a time vortex, with people desperately circling the edge, trying not to get sucked further in.

What usually works in software development is to return to the real underlying priorities. Ignore the surrounding circus. Keep producing reasonable code that goes as far as it can to really solve deep problems. It needs to be neat and consistent. A clean workspace avoids a lot of friction. All of the little details need attention. Programming involves a lot of patience.

If the codebase is solid, all the other problems remain at bay. If the codebase collapses, it opens the floodgates and lets the game get way out of control.

In that sense, it is all about controlling and minimizing the complexities. Fighting all the time to keep the artificial complexities from spawning, while making sure the inherent ones are satisfied.

Mastering this requires both a lot of knowledge and a lot of experience. It is a tricky juggling act. Keeping a million little balls in motion and off the ground while people scream at you about time.

That’s why people promising simple answers to these types of complex situations are inevitably destructive. They pick a few balls and improve those while the rest fall to the ground with a resounding thud. It seems to work right up until it inevitably collapses.

You can eliminate as much of the artificial complexity as possible, but never any of the inherent complexity. It remains and cannot be ignored. In software, you either have a reasonable codebase or you have a mess. This seems to be true elsewhere in meatspace as well.

Thursday, October 16, 2025

Patience

The biggest difference between now and when I started programming 35 years ago is patience.

Many of the people who commission software development projects are really impatient now.

The shift started with the dot-com era. There was a lot of hype about being the first into any given market. So, lots of people felt that it was better to go in early with very low quality, than to wait and produce something that was well refined.

That made sense then; a lot of those early markets were brand new, and many of the attempts to establish them were total failures. So, it doesn’t make a lot of sense to invest heavily in building a great piece of software if, in the end, nobody would want it anyway.

In the crater left behind, the industry shifted heavily to reactivity. Forget any sort of long-term planning or goals; just survive in the short term, throwing together whatever people say they want. That is a recipe to create a mess, but recreating that mess over and over again kept people busy.

Behind the scenes, we started sharing code a lot more. When I started coding, you had to write everything yourself. That took a long time, but if you were good, it also provided really great quality.

As more code became available, people would blindly throw in all sorts of stuff. It would bump up the functionality rapidly, but it also tended to bloat the code and leave a lot of dark corners in the codebase. They would wire up stuff that they barely understood, and it would seem to work for a while, only to end in tears.

Because of that, someone could toss together a quick demo that was really promising with a few neat features, without understanding that a real serious version of the same thing would require exponentially more effort. It started with websites, but quickly infected all software development. Fast-glued balls of mud became the de facto base for lots of systems, and they scale really poorly.

As the web dominated even more, since there were so many available components, and documentation never really matured, Q&A sites emerged. If you're rushing through a piece of work, with impatient people screaming at you, you can jump online, grab some example code, and slap it in. It just amplified the problems.

Mobile phones compounded the effect. An endless stream of noise made it hard to think deeply about anything. But shallow knowledge is effectively low-quality knowledge. You might know how to combine a bunch of things together, but when it doesn’t work as expected, there is very little you can do about it, except try again.

There are all sorts of trends about scaling software, and people get sucked into believing that it should be easy, but the first major failure point is the ability of people to deal with a big, ugly, messy, poorly constructed codebase. You will never get any sort of effective or reasonable behavior out of a pile of stuff that you don’t understand. Scaling requires deep knowledge, but impatience prevents us from acquiring that.

So I find it frustrating now. People run around making great claims about their software, but most of it is ugly, bloated, and buggy. We’re an industry of prioritizing marketing over engineering.

My favorite jobs were decades ago, back in what was at least the golden age of programming for me. Long before the central requirement became “just whack it out, we’ll fix it later”. What you don’t understand is a bug; it just may not have manifested yet.

Thursday, October 9, 2025

Experimentation

There are two basic ways of writing software code: experimentation and visualization.

With experimentation, you add a bunch of lines of code, then run it to see if it worked. As it is rather unlikely to work the first time, you modify some of the code and rerun. You keep this up until you a) get all the code you need and b) it does what you expect it to do.

For visualization, you think about what the code needs to do first. Maybe you draw a few pictures, but really, the functionality of the code is in your head. You are “seeing” it in some way. Then, once you are sure that that is the code you need, you type it out line by line to be as close as you can to the way you imagined it. After you’ve fixed typos and syntactic problems, the code should behave in the way you intended.

Experimentation is where everyone starts when they learn programming. You just have to keep trying things and changing them until the code behaves in the way you want it to.

What’s important, though, is if the code does not work as expected, which is common, you dig a little to figure out why it didn’t work. Learn from failure. But some people will just keep making semi-random changes to the code, hoping to stumble on a working version.

That isn’t so bad where there are only a small number of permutations; you end up visiting most of them, but for bigger functionality, there can be a massive number of permutations, and in some cases, it can be infinite. If you are not learning from each failure, it could take an awfully long time before you stumble upon the right changes. By avoiding learning something from each failure, you cap your abilities to fairly small pieces of code.

Instead, the best approach is to hypothesize about what will happen each time before you run the code. When the code differs, and it mostly will, you use that difference as a reason to dig into what’s underneath. Little by little, you will build up a stronger understanding of what each line of code does, what they do in combination, and how you can better leverage them. Randomly changing things and ignoring the failures wastes a lot of time and misses the necessity for you to learn stuff.

Visualization comes later, once you’ve started to build up a strong internal model of what’s happening underneath. You don’t have to write code to see what happens; instead, you can decide what you want to happen and then just make the code do that. This opens the door to you not only for writing bigger things, but also being able to writing far more sophisticated things. A step closer to mastering coding.

Experimentation is still a necessity, though. Bad documentation, weak technologies, weird behaviours; modern software is a mess and getting a little worse each year. As long as we keep rushing through the construction, we’ll never get a strong, stable foundation. We’re so often building on quicksand these days.

Thursday, October 2, 2025

The Value of Thought

You can randomly issue millions of instructions to a computer.

It is possible that when they are executed, good things will happen, but the odds of that are infinitesimally small.

If you need a computer to do anything that is beyond trivial, then you will need a lot of carefully constructed instructions to make it succeed.

You could try to iterate your way into getting these instructions by experimentation, using trial and error. For all of the earlier iterations just before the final successful one, though, some amount of the included instructions will essentially be random, so as initially stated, the odds that you blunder into the right instructions are tiny.

Instead, even if you are doing some experimentation, you are doing that to build up an internal understanding of how the instructions relate back to the behaviors of the computer. You are building a mental model of how those instructions work.

To be good at programming, you end up having to be good at acquiring this knowledge and using it to quickly build up models. You have to think very carefully about what you are seeing, how it behaves, and what you’d prefer it to have done instead.

These thoughts allow you to build up an understanding that is then manifested as code, which are the instructions given to the computer.

Which is to say that ‘coding’ isn’t the effort, thinking is. Coding is the output from acquiring an understanding of the problem and a possible solution to it. The software is only as good as the thoughts put into it.

If you approach the work too shallowly, then the software will not fit all of the expected behaviours. If the problems to be solved are deep and complex, then the knowledge needed to craft a good solution will also be deep and complex.

We see and acknowledge the value of the existing code, essentially as a form of intellectual property, but we are not properly valuing the knowledge, skills, time, and deep thinking that are necessary to have created such code. Software is only as good as the understanding of the programmers who created it. If they are clueless, the software is close to random. If they only understand a little part of what they are doing, the missing knowledge is getting randomized.

The quality of software is the quality of the thoughts put into it by everyone who contributed to it. If the thinking diminishes over time due to turnover, the quality will follow suit. If the original authors lack the abilities or understanding, the quality will follow suit.

So we can effectively mark out zero quality as being any set of random permutations that maximizes the incorrect behaviors, or bugs, as we like to call them.

But we can also go the other way and say that a very small set of permutations that makes reasonable behavioral tradeoffs while converging very close to zero deficiencies (both in the code itself and in its behavior) is the highest achievable quality. You can only achieve high quality if you’ve taken the time to really understand each and every aspect of what behavior is necessary. The understanding of the authors would have to be nearly full and complete, with no blind spots. That is a huge amount of knowledge, which takes a long time to acquire, and needs a group of people to hold and apply, which is why we don’t see software at that high quality level very often.

We value artwork correctly, though. A particular gifted artist’s work is not the value of the canvas, the frame, and the pigments applied. It is all that went into the artist's life that drove them to express their feelings into a particular painting. The Mona Lisa is a small canvas, but has great value, well beyond its physical presence.

Code is the same way. A talented and super knowledgeable group of people can come together to craft something deep and extremely useful. Its usefulness and value go far beyond the code; it comes from the thoughts that were built up in order to bring it into existence.

When that is forgotten, people stop trying to think deeply, and the quality plummets as a direct result. Thought is valuable, code is just proof that it happened.

Thursday, September 25, 2025

Nexus Point

There have been these moments, over the last four decades, where the winds of change were wafting throughout the software industry.

They are not big, dramatic events, but just these little ripples that are the harbinger of change.

One day, my university roommate showed me this plugin for Emacs that he had gotten from somewhere. It displayed “hypertext” from some server in Europe. Pretty browsers came later, and then the web descended on us like a tsunami, dragging computers out of basements for all to see.

Some guy named Steve finally showed off all of the tech he liked crammed into one tiny package; small enough to be convenient, but useful enough to be addictive.

I bought this interesting textbook about patterns, then watched as it morphed the cool new programming language into being the next legacy code generator.

I read this newly published manifesto about how not to get lost in bureaucracy, only to watch it spawn off its own aggressive and bizarre cult, making the bureaucracy the good old days.

Movements in software follow the snowball trend. There are slight indications that something is up, but then, as it rolls downhill, it picks up speed and size, until it comes slamming down onto everyone unsuspecting below.

Seems like we are back at one of those moments. The winds of change are wafting again.

It’s not AI, though; that’s just a cute mechanical trick, occasionally impressive, but far too erratic to be reliable. It’s in the wake.

Silver bullets come and go regularly, but this time it seems like this one is finally forcing us to be more honest about how we build code. The myths and games that clouded the past are quickly getting dispersed.

We’ve spent decades running away from the truth; programming is not a magic art form. It is something people can do; we can train them to do it, and they could be doing a much better job at coding someday than they are doing right now. Maybe LLMs are the nail guns of our industry, or maybe it is just a passing fad, but either way, it highlights the parts of programming that are not creative.

Most of what we do is grinding out pretty basic code, and once the larger directions are established, it is just work. Code is at its best when it is boring, predictable, readable, and doesn’t wantonly waste resources. Clear and organized. Pedantic with lots of tedious attention paid to the details.

There is and will always be some code that is super special, and we know that leveraging a powerful abstraction will lift the game to the next level, but all of the other stuff that surrounds that small amount of code is just not that interesting. Boring enough that a clever mechanical process can emulate us grinding it out.

What we do with this understanding is the big question. Will we get introspective and figure out how to make most coding a more reliable, trustworthy, and deterministic pursuit, or will we continue to hide in delusions of grandeur? Will we creatively ignore most of the details, or will we use this new knowledge to refine how we approach the work? There is a chance here that we can lift software development up to a new level, which is important given how reliant we have become on the stuff.

Thursday, September 18, 2025

Codebase Organization

A messy working environment is a huge amount of unnecessary friction. The worse it is, the harder it becomes to do things. It slows everything down and degrades the quality of the output. Digital work environments are no different than physical ones.

Like any other profession, software developers need to keep their workspaces tidy. Their primary output is the codebases they are building. So their workspaces are usually the code, its artifacts, the builds on their workstations, the backups to source code control repositories, and the deployments to test environments. All this is necessary before being able to release software to any operational environments.

Organization is three things: a) a place for everything, b) everything in its place, and c) not too many similar things all in the same place.

We primarily work on code and configuration data. We’ll generally use at least two different programming languages, one as primary, the other for builds and automation. There are often secondary, indirect, but related resources for complex issues like persistence handling.

A place for everything means that if you have some new code or data, you know exactly where it should go. It’s not open for discussion; there aren't any choices. It exists, everyone knows about it, and there is just one place for it to go.

Organization is zero freedom. If you don’t put things in their place, it is disorganized. If you do that enough, it is a mess, and it becomes increasingly harder to find or deal with the stuff you already have. Creating new stuff in a huge, disorganized mess just makes the mess worse; it does not fix the problem.

That place is dictated by the architecture, which lays down a structure for all of the code and included artifacts. It is specific: code X belongs in file Y, in directory Z. If there is doubt or ambiguity as to the place, it is disorganized.

More importantly, if you have duplicate versions of code X, and they are located in two different places, this is disorganized. One of them is in the wrong place. Duplicate code is a direct form of disorganization.

Naming is considered hard, but it is because many programmers believe that there is a lot of freedom possible. However, the name itself is also an artifact, and so it has a specific place too. That place is dictated by the naming conventions. To be organized, you need the naming conventions, and they should be explicit about where you ‘place’ the names. This not only includes any external references, but also variables, functions, etc. Every name in all of the code and the artifacts. Comments are a little outside of this, as they should be optional, extra knowledge that is not obvious from the code, the artifacts, the architecture, or the naming convention.

Again, if you are organized, you’d never end up with the same ‘thing’ called two different names, as one of those names is wrong. Good naming isn’t just an attribute of readability; it is also a big part of staying organized. Bad, inconsistent naming is a visible form of disorganization.

This is wonderful and all, but in actual practice, strict organization takes such a huge amount of time, and we are generally rushed while working. So, things will get messy -- disorganized -- but it's very, very important to stop, every so often, and do some clean up. You can’t let the mess win, and you get more time back from cleaning up messy stuff than you lose doing it.

Cleanup is just refactoring. Moving things around to put them back into the places where they should have been originally. For some stuff, that might first mean deciding on the ‘place’ and then sticking to it consistently for all things in the codebase. It is essentially non-destructive (unless there are architectural or domain problems that get dragged in) and really is just moving things a little closer to being better organized.

Decide on a place for ‘things’ that you ignored earlier. Find that stuff and put it into its place. If one place ends up with too many slightly different things, break it up into two or more places.

If you keep doing this regularly, the code will converge on being well-written. If you don’t do it or are not allowed to do it, the mess will continue to grow, and grow, and grow until the friction becomes a hurricane.

Wednesday, September 10, 2025

Manifestations

The only two things in a computer are code and data.

Code is a list of instructions for a computer to follow. Data is a symbolic encoding of bits that represents something else.

In the simplest of terms, code is a manifestation of what a programmer knew when they wrote it. It’s a slight over-simplification, but not too far off.

More precisely, some code and some configuration data come directly from a programmer’s understanding.

There could be generated code as well. But in an oddball sense, the code that generated that code was the manifestation, so it is still there.

Any data in the system that has not been ‘collected’ is configuration data. It was understood and placed there by someone.

These days, most code comes from underlying dependencies. Libraries, frameworks, other systems, and products. Interactions with these are glued into the code. The glue code is the author’s understanding, and the dependency code is the understanding of all of the other authors who worked on it.

Wherever and however we boil it down, it comes down to something that some person understood at some point. Code does not spontaneously generate. At least not yet.

The organization and quality of the code come directly from its author. If they are disorganized, the code is disorganized. If they are confused, the code is confused. If they were rushed, the code is weak. The code is what they understand and are able to assemble as instructions for the computer to follow.

Computers are essentially deterministic machines, but the output of code is not guaranteed to be deterministic. There are plenty of direct and indirect ways of injecting non-determinism into code. Determinism is a highly valuable property; you really want it in code, where possible, because it is the anchor property for nearly all users' expectations. If the author does not understand how to do this, the code will not be deterministic, and it is far too easy to make mistakes.

That code is so closely tied to the understandings of its authors that it has a lot of ramifications. The most obvious is that if you do not know something, you cannot write code to accomplish it. You can’t because you do not know what that code should be.

You can use code from someone else who knows, but if there are gaps in their knowledge or it doesn’t quite apply to your situation, you cannot really fix it. You don’t know how to fix it. You can patch over the bad circumstances that you’ve found, but if they are just a drop in a very large bucket, they will keep flowing.

As a consequence, the combined output from a large group of novice programmers will not exceed their individual abilities. It doesn’t matter how many participate; it is capped by understanding. They might be able to glue a bunch of stuff together, as learning how to glue things is a lesser skill than coding them, but all of the risks associated with those dependencies are still there and magnified by the lack of knowledge.

As mentioned earlier, a code generator is just a second level of indirection for the coding issues. It still traces back to people. Any code constructed by any automated process has the same problem, even if that process is sophisticated. Training an LLM to be a dynamic, but still automated, process does not escape this limitation. The knowledge that flowed into the code just comes from more sources, is highly non-deterministic, and rather obviously has even more risk. It’s the same as adding more novice programmers into the mix; it just amplifies the problems. Evidently, we are told that getting enough randomly typing monkeys on typewriters could generate Shakespeare, but that says nothing about the billions of monkeys you’ll need to do it, nor the effort to find that elusive needle in a rather massive haystack. It’s a tree falling in a forest with no one around.

For decades, there have been endless silver bullets launched in an attempt to separate code and configuration data away from the people who need to understand it. As Frederick P. Brooks pointed out in the 1970s, it is not possible. Someone has to issue the instructions, and they cannot do that if they don’t understand them. The work in building software is acquiring that understanding; the code is just the manifestation of that effort. If you don’t do the work, you will not get the software. If you get rid of the people who did the work, you will not be able to continue the work.

Friday, September 5, 2025

Sophistication

Software can be very addictive when you need to use it.

No doubt there are other ways to deal with your problems, but the software just clicks so nicely that you can’t really find any initiative to change.

What makes software addictive is sophistication.

It’s not just some clump of dumb, awkward features. The value is far more than the whole because it all comes together at a higher level, somehow

Usually, it stems from some overlying form of abstraction. A guiding principle permeates all aspects of the work.

There is a simple interface on top that stretches far down into the depths. So, when you use it for a task, it makes it simple to get the work done, but it does the task so fully and completely, in a way that you can actually trust, that it could not have been done any better. You are not left with any lingering doubts or annoying side effects. You needed to do the task; the task is now done. Forever.

Crude software, on the other hand, gets you close, but you are left unsatisfied. It could have been done better; there are plenty of other little things that you have to do now to clean up. It’s not quite over. It’s never really over.

Sophistication is immensely hard to wire into software. It takes a great deal of empathy for the users and the ability to envision their whole path, from before the software gets involved to long afterward. It’s the notion that the features are only a small part of a larger whole, so they have to be carefully tailored for a tight fit.

It requires that you step away from the code, away from the technology, and put yourself directly into the user’s shoes. It is only from that perspective that you can see what ‘full’ and ‘complete’ actually mean.

It is incredibly hard to write sophisticated code. It isn’t just a bunch of algorithms, data structures, and configuration. Each and every tiny part of the system adds or subtracts value from the overall. So the code is deep and complex and often pushes right up against the boundaries of what is really possible with software. It isn’t over-engineering, but it sure ain’t simple either. The code goes straight into the full complexity and depth of the problem. Underneath, it isn’t crude, and it isn’t bloated. It’s a beautiful balance point, right and exactly where the user needs it to be.

Most people can’t pull sophistication out of thin air. It’s very hard to imagine it until you’ve seen it. It’s both fiddly and nitpicky, but also abstract and general. It sits there right in the middle with deep connections into both sides. That’s why it is so rare. The product of a grand master, not just someone dabbling in coding.

Once sophisticated code gets created, because it is so addictive, it has a very, very long lifetime. It outlasts its competitors and usually generations of hollow rewrites. Lots of people throw crude stuff up against it, but it survives.

Sophistication is not something you add quickly. Just the understanding of what it truly means is a long, slow, painful journey. You do not rush it; that only results in crude outcomes. It is a wonderful thing that is unfortunately not appreciated enough anymore.

Friday, August 22, 2025

Ordering

For any given amount of work, there is at least one order of doing that work which is the most efficient.

If there are dependencies with the sub-tasks, then there is at least one order of doing the work that is least efficient.

For any two tasks, there may be obvious dependencies, but there may be non-obvious secondary ones as well. If one task requires speed and the other requires strength, even though they do seem to be unrelated, there could be issues like muscle fatigue or tiredness of a person if they are the same one for both tasks.

With most tasks, most of the time, you should assume there are known and unknown dependencies, which means there is very likely, almost always, a most-efficient set of orderings. Assume it is rare for this not to be the case.

For any given dependency, its effect is to reweight the effort needed for both tasks. Doing the one task first means the second one will now take a little longer. We refer to this as friction on the second task.

Like dependencies, there is obvious friction and then non-obvious friction. If you do some task and many of the lower sub-tasks take a little longer, but you don’t know why, there is some non-obvious friction happening, which indicates that there are some non-obvious dependencies involved.

All this applies heavily to software development. When building a big system, there are ways through all of the tasks that are more efficient than others. From experience, the difference is a multiplier. You could spend 3x more effort building it one way than some other way, for example. In practice, I have seen much higher multipliers, like 10x or even crazy ones like 100x.

It’s sometimes not obvious, as large projects span long periods and have many releases in between. You’d have to step back and observe the whole lifecycle of any given project to get a real sense of the damage that some more subtle types of friction have caused.

But the roots of the friction are often the same. Someone is trying to do one task that is dependent on another one before the foundational task is completed. Which means that changes to the lower task as it goes are causing extra work for the higher one.

We can skip over architectural discussions about height and just simply assess whether one piece of code or data depends on another piece of code or data. That is a dependency which, when handled out of order, creates friction.

Overall, it always means that you should build things from the bottom up. That would always be the most efficient way of getting through the tasks. Practically, that is not always possible, at least overall, but it is often possible within thin verticals in the system. If you add a new feature, it would be most efficient to address the modelling and persistence of the data first, then gradually wire it in from there until you get to the user interface. What might have driven the need for such a feature was the user experience or their domain problems, and that analysis is needed before the coding starts, which is top/down. But then after that, you flip the order for the implementation to bottom-up, and that would be the fastest that you can make it happen.

That the order is flipped depending on the stage is counterintuitive, which is why it is so controversial. But if you work it back from the first principles above, you can see why this happens.

In development, order is important. If you do build too slowly, that spins off politics, which often starts to further degrade the order, so getting control of this is vital to getting the work out as efficiently and smoothly as possible. Most people will suggest inefficient orders, based on their own understanding, so it is better to not let it be in the hands of most people.

Friday, August 15, 2025

Bad Engineering

Lots of people believe that if you just decompose a big problem into enough little ones, it is solved.

A lot of the time, though, the decomposition isn’t into complete sub-problems but just partial ones. The person identifies a sub-problem, they bite off a little part of it, and then push the rest back out.

A good example is if some data needs processing, so someone builds a middleware solution to help, but it is configurable to let the actual processing be injected. So it effectively wraps the processing, but doesn’t include any actual processing, or minimal versions if it.

Then someone comes along and needs that processing. They learn this new tech, but then later realize that it doesn’t really solve their problem, and now they have a lot more fragments that still need to be solved.

Really, it just splintered into sub-problems; it didn’t solve anything. It’s pure fragmentation, not encapsulation. It’s not really a black box if it’s just a shell to hold the actual boxes …

If you do this a lot as the basis for a system, the sheer number of moving parts will make the system extraordinarily fragile. One tiny, unfortunate change in any of the fragments and it all goes horribly wrong. Worse, that bad change is brutal to find, as it could have been anywhere in any fragment. If it wasn’t well organized and repo’d then it is a needle in a haystack.

Worse, each fragment's injection is different from all of the other fragments’ injections. There is a lot of personality in each component configuration. So instead of having to understand the problem, you now have to understand all of these fragmented variations and how they should all come together, which is often far more complex than the original problem. So you think you've solved it, but instead you just made it worse.

If you look at many popular tech stacks, you see a huge amount of splinter tech dumped there.

They become popular because people think they are a shortcut to not having to understand the problems, and only realize too late that it is the long, treacherous road instead.

Companies like to build splinter tech because it is fast and relatively easy to get to market. You can make great marketing claims, and by the time the grunts figure it out, it is too late to toss, so it is sticky.

Splinter tech is bad engineering. It is both bloat and obfuscation. Fragments are a big complexity multiplier. A little of it might be necessary, but it stacks up quickly. Once it is out of control, there is no easy way back,

It hurts programmers because they end up learning all these ‘component du jour’ oddities, then the industry moves on, and that knowledge is useless. Some other group of splinter tech hackers will find a completely different and weird way of doing similar things later. So it's temporary knowledge with little intrinsic value. Most of this tech has a ten-year or less lifespan. Here today, gone tomorrow. Eventually, people wake up and realize they were duped.

If you build on tech with a short life span, it will mostly cripple your work’s lifespan. The idea is not to grind out code, but to solve problems in ways that stay solved. If it decays rapidly, it is a demo, not a system. There is a huge difference between those two.

If you build on top of bad engineering, then that will define your work. It is bad by construction. You cannot usually un-bad it if you’re just a layer of light work or glue on top. Its badness percolates upwards. Your stuff only works as well as the components it was built on.

Friday, August 8, 2025

Static vs Dynamic

I like the expression ‘the rubber meets the road’.

I guess it is an expression about driving, the rubber is tires, maybe, but it also applies in a rather interesting way to software.

When a software program runs, it issues millions, if not billions, of very, very specific instructions for the computer to follow.

When we code, we can add variability to that, so we can make one parameter an integer, and we can issue the exact same instructions but with different values. We issue them for value 20, then we issue them again for 202, for example.

That, relative to the above expression, is the rubber meeting the road twice, once for each value.

Pull back a little from that, and what we have is a ‘context’ of variability, that we actuate to get the instructions with a rather specific value for each variable.

In programming, if we just hardcode a value into place, it is not a variable. We tend to call this ‘static’, being that it doesn’t change. When the rubber hits the road, it was already hardcoded.

If we allow it to vary, then the code is at least ‘dynamic’ on that variable. We pick from a list of possible options, then shove it in, and execute the whole thing.

The way we can pick can be picking directly from a list of possible values, or we can have ‘levels of indirection’. We could have a ‘pointer’ in the list that we use to go somewhere else and get the value, thus one level of indirection. Or we could stack the indirections so that we have to visit a whole bunch of different places before the rubber finally meets the road.

With the instructions, we can pretty much make any of the data they need variable. But we can also make the instructions variable, and oddly, the number of instructions can vary too. So, we have degrees of dynamic behaviour, and on top, we can throw in all sorts of levels of indirection.

From a complexity perspective, for each and every thing we make dynamic and for each and every level of indirection, we have kicked up the complexity. Static is the simplest we can do, as we need that instruction to exist and do its thing. Everything else is more complex on top.

From an expressibility and redundancy perspective, making a lot of stuff dynamic is better. You don’t have to have similar instructions over and over again, and you can use them for a much wider range of problems.

If you were to make a specific program fully dynamic, you would actually just end up with a domain programming language. That is, taken too far, since the rubber has to meet the road at some point at runtime, the code itself would end up being refactored into a full language. We see this happen quite often, where so many features get piled on, and then someone points out that it has become Turing Complete. You’ve gone a little too far at that point, unless the point was to build a DSL. Then, for instance, SQL being Turing complete is actually fine, full persistence solutions are DSLs almost by definition. Newer implementations of REs being Turing complete, however, is a huge mistake since they corrode the polymorphic behaviour guarantees that make REs so useful.

All of this gets us back to the fundamental tradeoff between static and dynamic. Crafting similar things over and over again is massively time-consuming. Doing it once, but making some parts variable is far better. But making everything dynamic goes too far, and the rubber still needs to meet the road. Making just enough dynamic that you can reuse it everywhere is the goal, but throwing in too many levels of indirection is essentially just fragmenting it all into a nightmare.

There is no one-size-fits-all approach that always works, but for any given project, there is a better degree of dynamic code that is the most efficient over the longer term. So if you know that you’ll use the same big lump of code 7 times in the solution, then adding enough variability to cover all 7 with the same piece of code is best, and getting all 7 static configs for this in the same place is perfect. That would minimize everything, so the best you can do.

Friday, August 1, 2025

Encapsulation vs Fragmentation, Again

Long ago, I noticed a trend. Coming out of the eighties, people had been taking deep abstractions and encapsulating them into very powerful computational engines. That approach gave rise to formalized variations like data structures, object-oriented programming, etc.

But as the abstractions grew more sophisticated, there was a backlash. The industry was exploding in size, and with more new people, a lot of programmers wanted things to be simpler and more independent. Leveraging abstractions requires learning and thinking, but that slows down programming.

So we started to see this turn towards fragmented technologies. Instead of putting your smarts all in one place, you would just scattershot the logic everywhere. Which, at least initially, was faster.

If you step back a bit, it is really about individual programmers. Do you want to slowly build on all of these deep, complicated technologies, or just chuck out crude stuff and claim success? Personal computers, the web, and mobile all strove for decentralization, which you leveraged with lots of tiny fragments. Then you only had to come up with a clever new fragment, and you were happy.

Ultimately, it is an organizing problem. A few fragments are fine, but once there are too many, the complexity has been so amplified by the sheer number of them that it is unmanageable. Doomed.

Once you have too many, you’ll never get it stable; you fix one fragment, and it breaks a couple of others. If you keep that up, eventually you cycle all the way back around again and start unfixing your earlier fixes. This is pretty much guaranteed at scale, because the twisted interconnections between all of the implicit contextual dependencies are a massive Gordian knot.

Get enough fragments, and it is over. Every time, guaranteed.

Oddly, the industry keeps heading directly into fragmentation, promoting it as the perfect solution, then watching it slowly blow up. After which it will admit there was a problem, switch to some other new fragmented potential, and do it all over again. And again.

I guess microservices have become a rather recent example. 

We tried something similar in the early '90s, but it did not end well. A little past the turn of the century, that weed sprang up again.

People started running around saying that monoliths are bad. Which isn’t why true, all of your pieces are together in one central place, which is good, but the cost of that is limits on how grand you can scale them.

The problem isn’t centralization itself, but rather that scaling is and never will be infinite. The design for any piece of software constrains it to run well within just a particular range of scale. It’s essentially a mechanical problem dictated by the physics of our universe.

Still, a movement spawned off that insisted that with microservices, you could achieve infinite scaling. And it was popular with programmers because they could build tiny things and throw them into this giant pot without having to coordinate their work with others. Suddenly, microservices are everywhere, and if you weren't doing them, you were doing it wrong. The fragmentation party is in full swing.

There was an old argument on the operating system side between monolithic kernels and microkernels. Strangely, most of the industry went with one big messy thing, but ironically, the difference was about encapsulation, not fragmentation. So what we ended up with was one big puddle of grossly fragmented modules, libraries, and binaries that we called a monolith, since that was on top. Instead of a more abstracted and encapsulated architecture that imposed tighter organizational constraints on the pieces below.

So it was weird that we abused the terminology to hide fragmentation, then countered a bit later with a fully fragmented ‘micro’ services approach with the opposite name. Software really is an inherently crazy industry if you watch it long enough.

These days, there seems to be a microservices backlash, which isn’t surprising given that it is possibly the worst thing you can do if you are intentionally building a medium-sized system. Most systems are medium-sized. 

Whenever you try to simplify anything by throwing away any sort of organizing constraints, it does not end well. A ball of disorganized code, data, or configs is a dead man walking. Even if it sort of works today, it’s pretty much doomed long before it pays for itself. It is a waste of time, resources, and effort.

All in all, though, the issue is just about the pieces. If they are all together in one place, it is better. If they are together and wrapped up nicely with a bow, it is even better still.

If they are strewn everywhere, it is a mess, and what is always true about a mess is that if it keeps growing, it will eventually become so laborious to reverse its inherent badness that starting over again is a much better (though still bad) choice. 

The right answer is to not make a mess in the first place, even if that is slower and involves coordinating your work with a lot of other people.

The best answer is still to get it all into reusable, composible pieces so that you can leverage it to solve larger and larger problems quickly and reliably. That has been and will always be the most efficient way forward. When we encapsulate, we contain the complexity. When we fragment, it acts as a complexity multiplier. Serious software isn’t about writing code; it is about controlling complexity. That has not changed in decades, even though people prefer to pretend that it has.

Friday, July 25, 2025

Determinism

Having been around for a long time, I often realize that when I use terms like ‘determinism’, I have a slightly different, somewhat deeper sense of its meaning.

In general, something is deterministic if, no matter how often you do it, the results are always the same. Not similar, or close, but actually the same.

Computers are interesting beasts. They combine the abstract formalism of mathematics with a strong footprint in reality, as physical machines. Determinism is an abstract concept. You do something and 100% of the time, the results are the same. That we pile on massive amounts of instructions on top of these formal systems and interpret them with respect to our humanity does not change the notion of determinism. What does mess with it a bit is that footprint in reality.

Hardware is physical and subject to the informal whims of the world around us. So, sometimes it fails.

Within software, though, we effectively disconnect ourselves from that binding to reality. We ignore it. So, we do say that an algorithm is deterministic, in the abstract sense, even if it is running on hardware that effectively injects some nondeterminism into the mix. I could probably go on forever about that touchpoint, but given that we choose to ignore it, that is all that really matters.

So, in that sense, without respect to reality, we can say that an algorithm is deterministic. Given the same inputs, you will always get the same outputs, every time. More importantly, a mandatory property of something actually being an algorithm is determinism. We do have a term for sets of instructions that do not absolutely work reliably, really just best efforts, we call them heuristics. A heuristic will do its best to get an answer, but for any number of reasons, it will not be 100%. It may be 99.9999%, but that .0001% failure rate, when done often enough, is actually significant.

All of this is more important than just being a theoretical discussion. What we need from and what people expect from software is determinism. They need software they can rely on, each and every time they go to use it. It is the core unstated requirement of basically every piece of software out there, with the exception of code that we know is theoretically close to being impossible. A heuristic would never do when an algorithm exists.

The classic example of this is hiding in plain sight. A graphical user interface is a ‘pretty’ means of interacting with computers. You do something like press a button on-screen, and that triggers one or more computers to do some work for you. That’s nice.

You press the button, and the work gets done. The work itself should be deterministic. So, each time you press the button, the results are the same.

No doubt people have seen plenty of interfaces where this is not true. In the early days of the web, for example, we had a lot of issues with ‘double clicks’ until we started building in double click protection to ignore the second click if an earlier one was in play. We did that to avoid burning resources, but we also did it to restore some determinism to the interface. People would get annoyed if, for example, they accidentally double-clicked and that caused the software to break or do weird things. It would ‘bug’ them, but really, what it did was violate their expectations that their interaction with the interface was deterministic, which is key.

So, a single click can and should be deterministic, but what about a series of them?

One of the bad habits of modern programmers is that they push too much of their workload into GUIs. They think because there is an interface where they can click on everything they need, and that each click is in itself deterministic, that it is a good way of getting tasks done. The problem is not the buttons, but what lies between them.

If you always have to click 3 buttons to get a specific result, it is probably fine. But once that grows in size to 10 buttons, or 50 buttons, or, as it seems in some cases, 100 buttons, the determinism fails rather dramatically. It’s not the software, though; it is the person in between. We are heuristic. Experts strive to be deterministic, but we are battling against our very nature to be absolutely precise absolutely every time. And that plays out, as one might expect, in long button sequences. Sometimes you hit the 100 in the right order, as desired, but sometimes you don’t. Maybe you hit 99 of them, or in the middle, the order is slightly different. It doesn’t matter in that we know that people are not deterministic, and we can absolutely depend on that being the case,

If you wired up one button to hit the other 100, then you are back to being deterministic again, but if you don’t do that, then using the GUI for any non-trivial task is non-deterministic, simply because people are non-deterministic.

This is exactly why so many old and experienced programmers keep trying to get people to script stuff instead. If you have a script, and you give it the same inputs, then if it was written properly, when it runs, it will give you the exact same outputs, every time. And it is easy to write scripts with no arguments on top of scripts that have some variability to make it better.

If you were going to do a big release of complicated software, if the release process is a bunch of button clicks in a bunch of different apps, you would be asking for trouble. But if it was just one script called ‘release.sh’ in one place, with no arguments, then your release process would be fully, completely, and totally deterministic.

If there is some unwanted variability that you’ve injected into the process, then that acts as a particularly nasty bit of friction. First, it should scare you to do a release if there is a possibility that you might do it incorrectly. Second, when it is incorrect, the cleanup from having messed it up is often quite expensive. What happens then is that it might work a few times initially, but then people get tired and it goes wrong. Then they get scared, and it either slows everything down out of fear or it keeps going wrong, and it makes it all worse.

That then is why determinism is just so important to software developers. It might be easy to play with a GUI and do things, but you’ve given up determinism, which will eventually bite you in the hand, just when you can’t afford that type of mistake. It’s high risk and high friction. Both of which are now making it harder to get stuff done as needed.

It takes a lot longer to script everything, but once you are on your way, it gets easier and easier as you’ve built up the foundations for getting more and more stuff done. As you go, the scripts get battle-tested, so they rather naturally act as their own test harness. If you fix the scripts instead of avoiding them, you get to this point where tasks like releases are so easy and reliable that there is very little friction to getting them done. The only thing stopping you from doing it too frequently is whether or not they are needed right away. This is the root of ideas like CI/CD pipelines. You’ll have to release often, so it needs to be deterministic.

Determinism plays out in all sorts of other ways within software. And usually the lack of it triggers relatively small side effects that are too often ignored, but build up. If you look for it in the code, in the technologies, in the process, and everywhere else, you find that getting closer to or achieving it is drastically reducing friction, which is making the job better and far less painful.

So it’s more than just a type of state machine, the entropy of hardware, or the noise on a network. It is a fundamental necessity for most of the solutions we build.

Friday, July 18, 2025

Anything Goes Style

In anything goes style, you code whatever works. You do not question it; if the results appear to be more or less correct when you run it on your machine, you ship it.

Anything goes style is often paired with brute force style. So you get these mega functions of insanely mixed logic that are deeply nested, and the code often does all sorts of bizarre, wasteful, and disorganized things. Generally, it has more bugs, and they are rarely fixed correctly since the logic is convoluted and fragile.

Anything goes style also burns resources like they are free and is a primary driver of bloat. It uses way more memory than it needs, relentlessly beats the disk to no effect, and litters the network with countless useless packets.

Modern hardware hides it, but when you see a lot of it congregating together, it is obvious that it is spending too much time doing useless work. We often see large software packages growing faster on disk than their added features.

The style became more popular with languages like PHP and JavaScript, but it got an epic shot of adrenaline with containers. No longer was it obvious that the code was awful when you can just package up the whole development machine and ship that directly, in all its inherent ugliness.

Anything goes is often the coding style at the root of security failures. The code is so obfuscated it can’t be reviewed, and the containers are opaque. That it isn’t doing its work properly isn’t noticed until it is too late and has already been exploited. A variant is to wire up overly expressive dependencies for simple tasks but not lock them down properly, so the whole thing has more holes than Swiss cheese.

Some people argue that it’s a programmer’s job to toss out their work as quickly as possible. Why spend extra time making sure infrequent things like security breaches don’t happen? This has led to some epic failures and a growing frustration amongst computer users that software is ruining our world. It is the tragic opposite of engineering. Our job is not to create more software, but rather it is to solve people's problems with reliable software.

Other styles include:
https://theprogrammersparadox.blogspot.com/2025/06/brute-force-style.html
https://theprogrammersparadox.blogspot.com/2025/05/house-of-cards-style.html
https://theprogrammersparadox.blogspot.com/2023/04/waterloo-style.html

Friday, July 11, 2025

Assumptions

You’re asked to build a big system that solves a complex business domain problem.

But you don’t know anything about the business domain, or the actual process of handling it, and there are some gaping holes in your technology knowledge for the stack that you need to make it all work properly. What do you do?

Your biggest problem is far too many unknowns. Know unknowns and unknown unknowns. A big difficulty with software development is that we often solve this by diving in anyway, instead of addressing it proactively.

So we make a lot of assumptions. Tonnes of them.

We usually work with a vague understanding of the technologies. Either we ignore the business domain, or our understanding is so grossly over-simplified that it is dangerous. This is why there is so little empathy in our fragile creations.

It would be nice if this changed, but it does not, and has only gotten worse with time.

So instead, we need to deal with it.

First is to assume that almost everything is an assumption. Second is to insert enough flexibility into the work so that only minimal parts of it are lost if your assumptions are wrong.

For technical issues, on occasion, you can blindly guess correctly. More often, if you just follow the trends, for example, in a GUI, do whatever everybody else is doing, it's less likely to change. It’s a mixed bag, though, in that some super popular trends are actually really bad ideas, so it’s good to be a little sceptical. Hedge your bets and avoid things that are just too new and don’t have staying power.

But for business stuff, when it is as far away from what you imagine it to be, it is never easy. The obvious point is to go learn about how it actually works. Or get an expert and trust them fully. Often, that is not an option.

The other side is to be objective about it. Is it actually something that could be handled in multiple different ways? And how many possible variations in handling it can you imagine?

Valuations and pricing are good examples where people are usually very surprised at how different the actual reality is from what they might have guessed. Mostly because the most obvious ways of dealing with them are not practical, and a lot of history has flowed under the bridge already. If you have zero real exposure and you guess, it will most certainly be wrong.

The key is that if you do not know for certain, the code itself should not be static. That is, the code mirrors your own certainty of your own assumptions. Static if you are absolutely 1000% certain, dynamic if you are not.

If you think there might be ten ways to do something, then you implement the one you guessed is likely and make it polymorphic. As others pop up, it is easy to add them too. It takes a little more effort to make something encapsulated and polymorphic, but if you are right about being wrong, you just saved yourself some big trouble and a few bad days.

Flipping that around, scope creep isn’t often really scope creep, but more of assumption convergence. People assumed that a simple, trivial feature would do the trick, but at some point they were enlightened into realizing that was incorrect, so now the code has to do far more than they initially believed that it should. Knowledge was gained; the design and implementations should be updated to reflect that. What already exists should be properly refactored now.

In development projects where the coders don’t want to know anything about the underlying business problems, they get angry at the domain experts for not having known this sooner. In projects where the coders care about the outcomes, they are keen to resolve this properly. The difference is whether you see the job as churning specifications into code or as solving people's problems with code.

A while back, there was a lot of resistance to what was termed speculative generalization. If you could save yourself a few days by not making something encapsulated or polymorphic, it was argued that you should save those days. The problem was that when paired with a lack of caring about what the code was supposed to do, stubbornness in insisting that nothing should change just generated a lot of drama. And that drama and all of the communication around it eats up a tremendous amount of time. The politics flows fast and furious, so it drains the life out of everything else in the project. Everybody’s miserable, and it has eaten far more time than if you just made the change. People used to blame this on the waterfall process, but it is just as ugly and messy in lightweight methodologies.

With that in mind, a little extra time to avoid that larger and more difficult path is a lot of time saved. Just that you should not really forecast where the code base will grow, but instead just work to hedge your own lack of certainty.

It’s a shifting goal, though. As you build more similar things, you learn more and assume less. You know what can be static and what should likely be dynamic. Any other developer will disagree with you, since their experiences and knowledge are different. That makes it hard to get all of the developers on the same page, but development goes way smoother if they are all on the same page and can interchange and back each other up. That is why small, highly synced “tiger” teams can outcode much bigger projects.

It can be hard when something is counterintuitive to convince others that it is what it is. That is a trust and communication issue between the developers themselves. Their collective certainty changes the way they need to code. If it's mandated above or externally, it usually assumes total uncertainty, and so everything is dynamic and thus overengineered. That worst-case scenario is why it aggravates people.

The key, though, is always being objective about what you actually know for certain. If you can step back and not take it personally, you can make good choices in how to hedge your implementation and thus avoid all sorts of negative outcomes. If you get it nearly right, and you’ve focused on readability, defensive coding, and all the other proactive techniques, then releasing the code will be as smooth as you can get it.

If you go the other way and churn a ball of mud, the explosion from releasing it will be bigger than the work to create it. Just as you think it is going out the door and is over, it all blows up in your face, which is not pleasant. They’ll eventually forgive you for being late if it was smooth, but most other negative variations are often fatal.

Thus, the adage “assumptions kill”, but in an industry that is built around and addicted to assumptions, you are already dead, you just don’t know it yet.

Friday, July 4, 2025

Industrial Strength

Software that continues to correctly solve a problem, no matter what chaos surrounds it, is industrial strength. It just works, it always works.

It is about reliability and expectations. The code is there when you need it and it behaves in exactly the way you knew it would.

You can be certain if you build on top of it, that it won’t let you down. There will always be problems, but the industrial-strength stuff is rarely one of them.

If code isn’t industrial strength it is a toy. It may do something cute, it may be clever. The results could be fascinating and highly entertaining. But it is still a toy.

You don’t want to build something serious on top of toys. They break when least expected, and they’ll do strange things periodically. They inject instability into whatever you are doing and whatever you have built on top.

Only toys can be built on other toys; they’ll never be industrial strength. Something fragile that breaks often can’t be counterbalanced properly. Contained perhaps, but the toy nature remains.

Lots of people market toys as if they were industrial strength. They are not, and highly unlikely to ever be. Toys don’t “mature” into industrial strength, they just get more flaky as time goes on.

Industrial strength is a quality that has to be baked into every facet of the design right from day one. It is not accidental. You don’t discover it or blunder into it. You take the state-of-the-art deep knowledge, careful requirements, and desired properties, then you make difficult tradeoffs to balance out the competing concerns. Industrial strength is always intentional and only ever by people who really understand what it means.

There is nothing wrong with toy software, it can be instructive, fun, or entertaining. But for the things that we really depend on, we absolutely do not want toy software involved. Some problems in life are serious and need to be treated seriously.

Friday, June 27, 2025

Building Things

One of my greatest curiosities over the last thirty-five years, while building complex software, is why construction projects for skyscrapers, which are highly complex beasts, seem to go so much smoother than software projects, most of which are far less complex.

When I was young, legend has it that at least 50% of all software projects would fail to produce working code. On top of that, most of what did eventually work was of fairly low quality. People used to talk a lot about the huge risks of development.

Since then, there have been endless waves of ‘magic’ with bold marketing claims, intended to change that, but it seems that our modern success rate is roughly similar. Of the projects that I have seen, interacted with, or discussed with fellow developers, the success rate is poor. It actually seems to have gotten worse. Although outright failures are a little less, the overall quality is far lower. Balls of mud are commonplace.

At some level, it doesn't make sense. People can come together to build extremely sophisticated things reliably in all sorts of other industries. So why is software special?

Some people claim it is because it is purely “digital”. That makes it abstract and highly malleable, somewhat like jello. But somewhere, the rubber always meets the road; software sits on top of hardware, which is entirely physical.

Computer systems are physical objects. They may seem easier to change in theory, but due to their weight, they more often get frozen in place. Sure at the last layer on top, close to the users, the coders can make an infinite number of arbitrary changes, but even when it is actual “code”, it is usually just configuration-based or small algorithmic tweaks. All the other crud sinks quickly to the bottom and becomes untouchable.

You can fiddle close to the top in any other medium. Fiddle with the last layers of wood, concrete, rubber, plastic, or metal. Renovate the inside or outside of a building. So that doesn’t seem to be the real issue.

Another possibility is the iceberg theory. Most of the people involved with building software only see the top 10% that sticks out of the water. The rest is hidden away below the waves. We do oddly say that the last 10% of any project is 90% of the effort, which seems to line up.

I used to think that was just non-technical people looking at GUIs, but it even seems true these days for backend developers blindly integrating in components that they don’t really understand. All other forms of engineering seem to specialize and insist on real depth of knowledge. The people entrusted to build specific parts have specific knowledge.

Software, because it has grown so fast, has a bad habit of avoiding that. Specialties come and go, but generalists are more likely to survive for their full careers. And generalists love to ignore the state of the art and go rogue. They prefer to code something easy for themselves, rather than code things correctly. They’ll often go right back to first principles and attempt to quickly reinvent decades of effort. These bad habits are reinforced by an industry that enshrines them as cool or popular, so they can sell pretend cures to fix them. Why do things properly when you can just buy another product to hide the flaws?

I tend to think that impatience is the strongest factor. I was taught to estimate pretty accurately when I was young, but most of the time, the people controlling the effort would be horrified to learn of those estimates. They are way too long. They want something in days, weeks, or months, but to get the implicit quality that they don’t realize they actually need, would sometimes take years. Their expectations are way, way off. But if you tell them the truth, they will just go to someone else who will tell them what they want to hear. They are told by the industry that everything is quick and easy now, so you can’t contradict that.

That doesn’t happen with big things like skyscrapers simply because there are regulators and inspectors involved at every step of the way to prevent it. Without them, skyscrapers would collapse quite regularly, which would kill an awful lot of people. Most societies can’t accept that, so they work hard to prevent it. Software failures, on the other hand, are extremely common these days, but death from them isn’t, so the industry is held to a lesser standard. That shows.

I do miss the days, long gone, when we actually had the time to craft stuff properly. Libraries and frameworks were scarce; we usually had to roll everything ourselves except for persistence. That was a mixed blessing. For good devs, they would get strong custom work that enhanced their codebases. But you’d see some exceptionally awful stuff too.

Getting mass tonnes of half-baked dependencies didn’t change that. Sure, it is easier for mediocre devs to hook up lots of stuff, but it is still possible for them to hook it up badly, and a lot of the parts are defective. So lots more complexity that still doesn’t work. Same difference, really. Just less real success stories.

All in all, it is a pretty weird industry. We learn stuff, then forget it, so that we have fun redoing it in a cruder way. That keeps the money rolling for everyone, but has clearly made our modern world extremely fragile.

Whacking together something fast isn’t better if the shelf life is far less. What comes together quickly falls apart quickly, too. Overall, it costs way more, adds more friction, and threatens stability. But since software is invisible and it's in a lot of people's best interests that it stays a mess, it seems that it will keep going this way forever. Rumors of its impending demise from AI are premature. Hallucinating LLMs fit in nicely in an industry that doesn’t want quality. Coders will be able to churn 10x or possibly even 1000x more crap out, so it is just a more personal stack overflow and infinite number of customized flakey dependencies. Far more code that is just as poor as today’s standards. It just opens up the door for more magic products that claim to solve that problem. And more grunts to churn code. One more loop through an endless cycle of insanity.