Thursday, February 5, 2026

Systems Thinking

There are two main schools of thought in software development about how to build really big, complicated stuff.

The most prevalent one, these days, is that you gradually evolve the complexity over time. You start small and keep adding to it.

The other school is that you lay out a huge specification that would fully work through all of the complexity in advance, then build it.

In a sense, it is the difference between the way an entrepreneur might approach doing a startup versus how we build modern skyscrapers. Evolution versus Engineering.

I was working in a large company a while ago, and I stumbled on the fact that they had well over 3000 active systems that were covering dozens of lines of business and all of the internal departments. It had evolved this way over fifty years, and included lots of different tech stacks, as well as countless vendors. Viewed as ‘one’ thing it was a pretty shaky house of cards.

It’s not hard to see that if they had a few really big systems, then a great number of their problems would disappear. The inconsistencies between data, security, operations, quality, and access were huge across all of those disconnected projects. Some systems were up-to-date, some were ancient. Some worked well, some were barely functional. With way fewer systems, a lot of these self-inflicted problems would just go away.

It’s not that you could cut the combined complexity in half, but more likely that you could bring it down to at least one-tenth of what it is today, if not even better. It would function better, be more reliable, and would be far more resilient to change. It would likely cost far less and require fewer employees as well. All sorts of ugly problems that they have now would just not exist.

The core difference between the different schools really centers around how to deal with dependencies.

If you had thousands of little blobs of complexity that were all entirely independent, then getting finished is just a matter of banging out each one by itself until they are all completed. That’s the dream.

But in practice, very few things in a big ecosystem are actually independent. That’s the problem.

If you are going to evolve a system, then you ignore these dependencies. Sort them out afterwards, as the complexity grows. It’s faster, and you can get started right away.

If you were going to design a big system, then these dependencies dictate that design. You have to go through each one and understand them all right away. They change everything from the architecture all the way down to the idioms and style in the code.

But that means that all of the people working to build up this big system have to interact with each other. Coordinate and communicate. That is a lot of friction that management and the programmers don’t want. They tend to feel like it would all get done faster if they could just go off on their own. And it will, in the short-term.

If you ignore a dependency and try to fix it later, it will be more expensive. More time, more effort, more thinking. And it will require the same level of coordination that you tried to avoid initially. Slightly worse, in that the time pressures of doing it correctly generally give way to just getting it done quickly, which pumps up the overall artificial complexity. The more hacks you throw at it, the more hacks you will need to hold it together. It spirals out of control. You lose big in the long-term.

One of the big speed bumps preventing big up-front designs is a general lack of knowledge. Since the foundations like tech stacks, frameworks, and libraries are always changing rapidly these days, there are few accepted best practices, and most issues are incorrectly believed to be subjective. They’re not, of course, but it takes a lot of repeated experience to see that.

The career path of most application programmers is fairly short. In most enterprises, the majority have five years or less of real in-depth experience, and battle-scared twenty-year+ vets are rare. Mostly, these novices are struggling through early career experiences, not ready yet to deal with the unbounded, massive complexity present in a big design.

Also, the other side of it is that evolutionary projects are just more fun. I’ve preferred them. You’re not loaded down with all those messy dependencies. Way fewer meetings, so you can just get into the work and see how it goes. Endlessly arguing about fiddly details in a giant spec is draining, made worse if the experience around you is weak.

Evolutionary projects go very badly sometimes. The larger they grow, the more likely they will derail. And the fun gives way to really bad stress. That severe last-minute panic that comes from knowing that the code doesn't really work as it should, and probably never will. And the longer-term dissatisfaction of having done all that work to ultimately just contribute to the problem, not actually fix it.

Big up-front designs are often better from a stress perspective. A little slow to start and sometimes slow in the middle, they mostly smooth out the overall development process. You’ve got a lot of work to do, but you’ve also got enough time to do it correctly. So you grind through it, piece by piece, being as attentive to the details as possible. Along the way, you actively look for smarter approaches to compress the work. Reuse, for instance, can shave a ton of code off the table, cut down on testing, and provide stronger certainty that the code will do the right thing in production.

The fear that big projects will end up producing the wrong thing is often overstated. It’s true for a startup, but entirely untrue for some large business application for a market that’s been around forever. You don’t need to burn a lot of extra time, breaking the work up into tiny fragments, unless you really don’t have a clue what you are building. If you're replacing some other existing system, not only do you have a clue, you usually have a really solid long-term roadmap. Replace the original work and fix its deficiencies.

There should be some balanced path in the middle somewhere, but I haven’t stumbled across a formal version of it after all these decades.

We could go first to the dependencies, then come up with reasons why they can be temporarily ignored. You can evolve the next release, but still have a vague big design as a long-term plan. You can refactor the design as you come across new, unexpected dependencies. Change your mind, over and over again, to try to get the evolved works to converge on a solid grand design. Start fast, slow right down, speed up, slow down again, and so forth. The goal is one big giant system to rule them all, but it may just take a while to get there.

The other point is that the size of the iterations matters, a whole lot. If they are tiny, it is because you are blindly stumbling forward. If you are not blindly stumbling forward, they should be longer, as it is more effective. They don’t have to all be the same size. And you really should stop and take stock after each iteration. The faster people code, the more cleanup that is required. The longer you avoid cleaning it up, the worse it gets, on basically an exponential scale. If you run forward like crazy and never stop, the working environment will be such a swamp that it will all grind to an abrupt stop. This is true in building anything, or even cooking in a restaurant. Speed is a tradeoff.

Evolution is the way to avoid getting bogged down in engineering, but engineering is the way to ensure that the thing you build really does what it is supposed to do. Engineering is slow, but spinning way out of control is a heck of a lot slower. Evolution is obviously more dynamic, but it is also more chaotic, and you have to continually accept that you’ve gone down a bad path and need to backtrack. That is hard to admit sometimes. For most systems, there are parts that really need to be engineered, and parts that can just be allowed to evolve. The more random the evolutionary path, the more stuff you need to throw away and redo. Wobbling is always expensive. Nature gets away with this by having millions of species, but we really only have one development project, so it isn’t particularly convenient.

Thursday, January 29, 2026

Reap What You Sow

When I first started programming, some thirty-five years ago, it was a somewhat quiet, if not shy, profession. It had already been around for a while, but wasn’t visible to the public. Most people had never even seen a serious computer, just the overly expensive toys sold at department stores.

Back then, to get to an intermediate position took about 5 yrs. That would enable someone to build components on their own. They’d get to senior around 10 yrs, where they might be expected to create a medium-sized system by themselves, from scratch. 20 yrs would open up lead developer positions for large or huge projects, but only if they had the actual experience to back it up. Even then, a single programmer might spend years to get their code size up to medium, so building larger systems required a team.

Not only did the dot-com era rip programming from the shadows, but the job positions also exploded. Suddenly, everyone used computers, everyone needed programmers, and they needed lots of them. That disrupted the old order, but never really replaced it with anything sane.

So, we’d see odd things like someone getting promoted to a senior position after just 2 or 3 years of working; small teams of newbie programmers getting tasked with building big, complex systems with zero guidance; various people with no significant coding experience hypothesising about how to properly build stuff. It’s been a real mess.

For most computer systems, to build them from scratch takes an unreasonably large amount of knowledge and skill, not only about the technical aspects, but also the domain problems and the operational setups, too.

The fastest and best way to gain that knowledge is from mentoring. Courses are good for the basics, but the practice of keeping a big system moving forward is often quite non-intuitive, and once you mix in politics and bureaucracy, it is downright crazy.

If you spend years in the trenches with people who really get what they are doing, you come up to speed a whole lot faster.

We have a lot of stuff documented in books and articles, and some loose notions of best practices, but that knowledge is often polluted with trendy advice, so people bend it incorrectly to help them monetize stuff.

There has always been a big difference between what people say should be done and what they actually do successfully.

Not surprisingly, after a couple of decades of struggling to put stuff together, the process knowledge is nearly muscle memory and somewhat ugly. It’s hard to communicate, but you’ve learned what has really worked versus what just sounds good. That knowledge is passed mouth to mouth, and it’s that knowledge that you really want to learn from a mentor. It ain’t pretty, but you need it.

As a consequence, it is no surprise that the strongest software development shops have always had a good mix of experience. It is important. Kids and hardened vets, with lots of people in the middle. It builds up a good environment and follows loosely from the notion that ‘birds of a feather flock together’. That type of experience diversity is critical, and when it comes together, the shop can smoothly build any type of software it needs to build. Talent attracts talent.

That’s why when we see it crack, and there is a staffing landslide, where a bunch of experienced devs all leave at the same time, it often takes years or decades to recover. Without a strong culture of learning and engineering, it’s hard to attract and keep good people; it’s understaffed, and the turnover is crazy high.

There are always more programming jobs than qualified programmers; it seems that never really changes.

Given that has been an ongoing problem in the industry for half a century, we can see how AI may make it far worse. If companies stop hiring juniors because their intermediates are using AI to whack out that junior-level code, that handoff of knowledge will die. As the older generations leave without passing on any process knowledge, it will eventually be the same as only hiring a bunch of kids with no guidance. AI won’t help prevent that, and its output will be degraded from training on those fast-declining standards.

We’ve seen that before. One of the effects of the dot-com era was that the lifespan of code shrank noticeably. The older code was meant to run for decades; the new stuff is often just replaced within a few years after it was written. That’s part of why we suddenly needed more programmers, but also why the cost of programming got worse. It was offset somewhat by having more libraries and frameworks available, but because they kept changing so fast, they also helped shorten the lifespan. Coding went from being slowly engineered to far more of a performance art. The costs went way up; the quality declined.

If we were sane, we’d actually see the industry go the other way.

If we assume that AI is here to stay for coders, then the most rational thing to do would be to hire way more juniors, and let them spend lots of time experimenting and building up good ways to utilize these new AI tools, while also getting a chance to learn and integrate that messy process knowledge from the other generations. So instead of junior positions shrinking, we’d see an explosion of new junior positions. And we’d see vets get even more expensive.

That we are not see this indicates either a myopic management effect or that AI itself really isn’t that useful right now. What seems to be happening is that management is cutting back on payroll long before the intermediates have successfully discovered how to reliably leverage this new toolset. They are jumping the gun, so to speak, and wiping out their own dev shops as an accidental consequence. It will be a while before they notice.

This has happened before; software development often has a serious case of amnesia and tends to forget its own checkered history. If it follows the older patterns, there will be a few years of decreasing jobs and lower salaries, followed by an explosion of jobs and huge salary increases. They’ll be desperate to undo the damage.

People incorrectly tend to think of software development as one-off projects instead of continually running IT shops. They’ll do all sorts of short-term damage to squeeze value, while losing big overall.

Having lived through the end of programming as we know it a few dozen times already, I am usually very wary of any of these hype cycles. AI will eventually find its usefulness in a few limited areas of development, but it won’t happen until it has become far more deterministic. Essentially, random tools are useless tools. Software development is never a one-off project, even if that delusion keeps persisting. If you can’t reliably move forward, then you are not moving forward. At some point, the ground just drops out below you, sending you back to square one.

The important point at the high level is that you set up and run a shop to produce and maintain the software you need to support your organization’s goals. The health of that shop is vital, since if it is broken, you can’t really keep anything working properly. When the toolset changes, it would be good if the shop can leverage it, but it is up to the people working there to figure it out, not management.

Thursday, January 22, 2026

Dirty Little Secret

At the beginning of this Century, an incredibly successful software executive turned to me and said, “The dirty little secret of the software industry is that none of this stuff really works.”

I was horrified.

"Sure, some of that really ancient stuff didn’t work very well, but that is why we are going to replace it all with all of our flashy new technologies. Our new stuff will definitely fix that and work properly," I replied.

I’ve revisited this conversation in my head dozens of times over the decades. That new flashy stuff of my youth is now the ancient crusty stuff for the kids. It didn’t work either.

Well, to be fair, some parts of it worked just enough, and a lot of it stayed around hidden deep below. But it's weird and eclectic, and people complain about it often, try hard to avoid it, and still dream of replacing it.

History, it seems, did a great job of proving that the executive was correct. Each new generation piles another mess on top of all of the previous messes, with the dream of getting it better, this time.

It’s compounded by the fact that people rush through the work so much faster now.

In those days, we worked on something for ‘months’, now the expectation is ‘weeks’.

Libraries and frameworks were few and far between, but that actually afforded us a chance to gain more knowledge and be more attentive to the little details. The practice of coding software keeps degrading as the breadth of technologies explodes.

The bigger problem is that even though the hardware has advanced by orders and orders of magnitude, the effectiveness of the software has not. It was screens full of awkward widgets back then; it is still the same. Modern GUIs have more graphics, but behave worse than before. You can do more with computers these days, but it is far more cognitively demanding to get it done now. We didn’t improve the technology; we just started churning it out faster.

Another dirty little secret is that there is probably a much better way to code things that was probably more commonly known when it was first discovered in the 70s or the 80s. Most programmers prefer to learn from scratch, forcing them to resolve the same problems people had decades ago. If we keep reinventing the same crude mechanics, it is no surprise that we haven’t advanced at all. We keep writing the same things over and over again while telling ourselves that this time it is really different.

I keep thinking back to all of those times, in so many meetings, were someone was enthusiastically expounding the virtues of some brand new, super trendy, uber cool technology, and essentially claiming “this time, we got it right”, while knowing, that if I wait for another five years, the tides will turn and a new generation will be claiming that that old stuff doesn’t work.

“None of this stuff really works” got stuck in my head way back then, and it keeps proving itself correct.

Thursday, January 15, 2026

This Week's Turbo Encabulator

Sometimes, the software industry tries to sell products that people don’t actually need and won’t solve any real problems. They are very profitable and need minimal support.

The classic example is the set of products that falls loosely under the “data warehouse” category.

The problem some people think they have is that if they did not collect data when they needed it, they don’t have access to it now. In a lot of cases, you can’t simply go back and reconstruct it or get it from another source; it is lost forever.

So, people started coming up with a long series of products that would let you capture massive amounts of unstructured data, then later, when you need it, you could apply some structure on top and use it.

That makes sense, sort of. Collect absolutely everything and dump it into a giant warehouse, then use it later.

The first flaw, though, is that collecting data and keeping it around for a long time is expensive. You need some storage devices, but you also need a way to tell when the data has expired. Consumer data from thirty years ago may be of interest to a historian, but is probably useless for most businesses. So, you have all of this unstructured data, when do you get rid of it? If you don’t know what the data really is, then sadly, the answer is ‘never’, which is insanely expensive.

The other obvious problem is that some of the data you have captured is ‘raw’, and some of it is ‘derived’. It is a waste of money to persist that derived stuff, since you can reconstruct it. But again, if you don’t know what you have captured, you can not distinguish it.

The bigger problem, though, is that you now have this sea of unstructured data, and thanks to various changes over time, it does not fit together nicely. So that act of putting a structure on top is non-trivial. In fact, it is orders of magnitude more complicated now than if you had just sorted out carefully what you needed first.

Changes make it hard to stitch together, and bugs and glitches pad it out with lots and lots of garbage. The noise overwhelms the signal.

It’s so difficult to meticulously pick through it and turn it into usable information that it is highly unlikely that anyone will ever bother doing that. Or if they did it for a while, eventually they’d stop doing it. If they don’t have the time and patience to do it earlier, then why would that change later?

So, you're burning all of these resources to prevent a problem you really shouldn’t have, in the unlikely case that someone may go through Herculean effort to get value out of it later.

If there is a line of business, then there are fundamental core data entities that underpin it. Mostly, they change slowly, but in all cases, you will always have ongoing work to keep up with any changes. You can’t escape that. If you did reasonable analysis and modelled the data correctly, then you could set up partitioned entry points for all of the data and keep teams around to stay synced to any gradual changes. In that case, you know what data is out there, you know how it is structured and what it means, and you have the appropriate systems in place to capture it. Your IT department is organized and effective.

The derived variations of this core data may go through all sorts of weird gyrations, but the fundamentals are easy enough to understand and capture. So, if you are organized and functioning correctly, you wouldn’t need an insurance technology to double up the capture ‘just in case’.

Flipped around, if you think you “need” a data warehouse in order to capture stuff that you worried you might have missed, your actual problem is that your IT department is a disorganized disaster area. You still don’t need a warehouse; you need to fix your IT department. So someone selling you a warehouse as a solution to your IT problems is selling you snake oil. It ain’t going to help.

Now it is possible that there is a lag between ‘changes’ and the ability to update the collection of the data. So, you think that to solve that, you need a warehouse, but the same argument applies.

The changes aren’t frequent enough and really aren’t surprises, so the lag is caused by other internal issues. If you have an important line of business, and it changes every so often, then it would make sense (and is cheaper) if you just have a team waiting to jump into action to keep up with those changes. If they are not fully occupied in between changes, that is not a flaw or waste of money; they are just firefighters and need to be on standby. Sometimes you need firefighters, or a little spare capacity, or some insurance. That is reasonable resource management. Don’t place and build your house on the beach based on low tides; do it based on at least king tides or even tsunamis.

There are plenty of other products sold to enterprises that are similar. If you look at what they do, and you ask reasonable questions about why they exist, you’ll often find that the answers don’t make any real sense. The industry prefers to solve these easy problems on a rotating basis.

There will be wave after wave of questionable solutions to secondary problems that ultimately just compound the whole mess. They make it worse. Then, as people realized that they don’t work very well, a whole new wave of uselessness will hit. So long as everyone is distracted and chasing that latest wave, they will be too busy to question the sanity of what they are implementing.

Thursday, January 8, 2026

Against the Grain

When I was really young and first started coding, I hated relational databases.

They didn’t teach us much about them in university, but they were entirely dominant in enterprise development for the 80s and 90s. If you needed persistence, only an RDBMS could be considered. Any other choice, like rolling it yourself, files, or lighter dbs, caches, was considered inappropriate. People would get mad if you used them.

My first experiences with an RDBMS were somewhat painful. The notion of declarative programming felt a bit foreign to me, and those earlier databases were cruder in their syntax.

But eventually I figured them out, and even came to appreciate all of the primary and secondary capabilities. You don’t just get reliable queries; you can use them for dynamic behaviour and distributed locking issues as well. Optimizations can be a little tiring, and you have to stay tight to normal forms to avoid piling up severe technical debt, but with practice, they are a good, strong, solid foundation. You just have to use them correctly to get the most out of them.

If you need reliable persistence (and you do) and the computations aren’t exotic (and they mostly aren’t), then relying on a good RDBMS is generally the best choice. You just have to learn a lot about the technology and use it properly, but then it works as expected.

If you try to use it incorrectly, you are doing what I like to call “going against the grain”. It’s an ancient woodworking expression, but it is highly fitting for technology. With the grain, you are using it as the originators intended, and against the grain, you are trying to get it to do something clever, awkward, or funny.

Sometimes people think they are clever by trying to force technology to do something unexpected. But that is always a recipe for failure. Even if you could get it to work with minimal side effects, the technology evolves, so the tricks will turn ugly.

Once you’ve matured as a programmer, you realize that clever is just asking for trouble, and usually for no good reason. Most code, most of the time, is pretty basic. At least 90% of it. Usually, it only needs to do the same things that people have been doing for at least half a century. The problem isn’t coming up with some crazy, clever new approach, but rather finding a very reasonable one in an overly short period of time, in a way that you can keep moving it forward over a long series of upgrades.

We build things, but they are not art; they are industrial-strength machinery. They need to be clean, strong, and withstand the ravages of the future. That is quality code; anything else is just a distraction.

Now, if you are pushing the state of the art for some reason, then you would have to go outside of the standard components and their usages. So, I wasn’t surprised that NoSQL came into existence, and I have had a few occasions where I both needed it and really appreciated it. ORMS are similar.

It’s just that I would not have been able to leverage these technologies properly if I didn’t already understand how to get the most out of an RDBMS. I needed to hit the limits first to gain an understanding.

So, when I saw a lot of people using NoSQL to skip learning about RDBMSes, I knew right away that it was a tragic mistake. They failed to understand that their usage was rather stock and just wanted to add cool new technologies to their resumes. That is the absolute worst reason to use any technology, ever. Or as I like to say, experiment on your own time, take your day job seriously.

In that sense, using an RDBMS for something weird is going against the grain, but skipping it for some other eclectic technology is also going against the grain. Two variations on the same problem. If you need to build something that is reliable, then you have to learn what reliable means and use that to make stronger decisions about which components to pile into the foundations. Maybe the best choice is old, and not great for your resume, but that is fine. Doing a good job is always more important.

This applies, of course, to all technologies, not just RDBMSes. My first instinct is to minimize using any external components, but if I have to, then I am looking for the good, reliable, industrial-strength options. Some super-cool, trendy, new component automatically makes me suspicious. Old, crusty, battle-scarred stuff may not look as sweet, but in most cases, it is usually a lot more reliable. And the main quality that I am looking for is reliability.

But even after you decide on the tech, you still have to find the grain and go with it. You pick some reasonable library, but then try to make it jump around in unreasonable ways; it will not end well. In the worst case, you incorrectly convince yourself that it is doing something you need, but it isn’t. Swapping out a big component at the last minute before a release is always a huge failure and tends to result in really painful circumstances. A hole that big could take years to recover from.

So, it plays back to the minimization. If we have to use a component, then we have to know how to use it properly, so it isn’t that much of a time saving, unless it is doing something sophisticated enough that learning all of that from scratch is way out of the time budget. If you just toss in components for a tiny fraction of their functionality, the code degenerates into a huge chaotic mess. You lose that connection to knowing what it will really do, and that is always fatal. Mystery code is not something you ever want to support; it will just burn time, and time is always in short supply.

In general, if you have to add a component, then you want to try to use all of its core features in a way that the authors expected you to use them. And you never want to have two contradictory components in the same system; that is really bad. Use it fully, use it properly, and get the most out of the effort it took to integrate it fully. That will keep things sane. Overall, beware of any components you rely on; they will not save you time; they may just minimize some of the learning you should have done, but they are never free.