Thursday, January 15, 2026

This Week's Turbo Encabulator

Sometimes, the software industry tries to sell products that people don’t actually need and won’t solve any real problems. They are very profitable and need minimal support.

The classic example is the set of products that falls loosely under the “data warehouse” category.

The problem some people think they have is that if they did not collect data when they needed it, they don’t have access to it now. In a lot of cases, you can’t simply go back and reconstruct it or get it from another source; it is lost forever.

So, people started coming up with a long series of products that would let you capture massive amounts of unstructured data, then later, when you need it, you could apply some structure on top and use it.

That makes sense, sort of. Collect absolutely everything and dump it into a giant warehouse, then use it later.

The first flaw, though, is that collecting data and keeping it around for a long time is expensive. You need some storage devices, but you also need a way to tell when the data has expired. Consumer data from thirty years ago may be of interest to a historian, but is probably useless for most businesses. So, you have all of this unstructured data, when do you get rid of it? If you don’t know what the data really is, then sadly, the answer is ‘never’, which is insanely expensive.

The other obvious problem is that some of the data you have captured is ‘raw’, and some of it is ‘derived’. It is a waste of money to persist that derived stuff, since you can reconstruct it. But again, if you don’t know what you have captured, you can not distinguish it.

The bigger problem, though, is that you now have this sea of unstructured data, and thanks to various changes over time, it does not fit together nicely. So that act of putting a structure on top is non-trivial. In fact, it is orders of magnitude more complicated now than if you had just sorted out carefully what you needed first.

Changes make it hard to stitch together, and bugs and glitches pad it out with lots and lots of garbage. The noise overwhelms the signal.

It’s so difficult to meticulously pick through it and turn it into usable information that it is highly unlikely that anyone will ever bother doing that. Or if they did it for a while, eventually they’d stop doing it. If they don’t have the time and patience to do it earlier, then why would that change later?

So, you're burning all of these resources to prevent a problem you really shouldn’t have, in the unlikely case that someone may go through Herculean effort to get value out of it later.

If there is a line of business, then there are fundamental core data entities that underpin it. Mostly, they change slowly, but in all cases, you will always have ongoing work to keep up with any changes. You can’t escape that. If you did reasonable analysis and modelled the data correctly, then you could set up partitioned entry points for all of the data and keep teams around to stay synced to any gradual changes. In that case, you know what data is out there, you know how it is structured and what it means, and you have the appropriate systems in place to capture it. Your IT department is organized and effective.

The derived variations of this core data may go through all sorts of weird gyrations, but the fundamentals are easy enough to understand and capture. So, if you are organized and functioning correctly, you wouldn’t need an insurance technology to double up the capture ‘just in case’.

Flipped around, if you think you “need” a data warehouse in order to capture stuff that you worried you might have missed, your actual problem is that your IT department is a disorganized disaster area. You still don’t need a warehouse; you need to fix your IT department. So someone selling you a warehouse as a solution to your IT problems is selling you snake oil. It ain’t going to help.

Now it is possible that there is a lag between ‘changes’ and the ability to update the collection of the data. So, you think that to solve that, you need a warehouse, but the same argument applies.

The changes aren’t frequent enough and really aren’t surprises, so the lag is caused by other internal issues. If you have an important line of business, and it changes every so often, then it would make sense (and is cheaper) if you just have a team waiting to jump into action to keep up with those changes. If they are not fully occupied in between changes, that is not a flaw or waste of money; they are just firefighters and need to be on standby. Sometimes you need firefighters, or a little spare capacity, or some insurance. That is reasonable resource management. Don’t place and build your house on the beach based on low tides; do it based on at least king tides or even tsunamis.

There are plenty of other products sold to enterprises that are similar. If you look at what they do, and you ask reasonable questions about why they exist, you’ll often find that the answers don’t make any real sense. The industry prefers to solve these easy problems on a rotating basis.

There will be wave after wave of questionable solutions to secondary problems that ultimately just compound the whole mess. They make it worse. Then, as people realized that they don’t work very well, a whole new wave of uselessness will hit. So long as everyone is distracted and chasing that latest wave, they will be too busy to question the sanity of what they are implementing.

Thursday, January 8, 2026

Against the Grain

When I was really young and first started coding, I hated relational databases.

They didn’t teach us much about them in university, but they were entirely dominant in enterprise development for the 80s and 90s. If you needed persistence, only an RDBMS could be considered. Any other choice, like rolling it yourself, files, or lighter dbs, caches, was considered inappropriate. People would get mad if you used them.

My first experiences with an RDBMS were somewhat painful. The notion of declarative programming felt a bit foreign to me, and those earlier databases were cruder in their syntax.

But eventually I figured them out, and even came to appreciate all of the primary and secondary capabilities. You don’t just get reliable queries; you can use them for dynamic behaviour and distributed locking issues as well. Optimizations can be a little tiring, and you have to stay tight to normal forms to avoid piling up severe technical debt, but with practice, they are a good, strong, solid foundation. You just have to use them correctly to get the most out of them.

If you need reliable persistence (and you do) and the computations aren’t exocitc (and they mostly aren’t), then relying on a good RDBMS is generally the best choice. You just have to learn a lot about the technology and use it properly, but then it works as expected.

If you try to use it incorrectly, you are doing what I like to call “going against the grain”. It’s an ancient woodworking expression, but it is highly fitting for technology. With the grain, you are using it as the originators intended, and against the grain, you are trying to get it to do something clever, awkward, or funny.

Sometimes people think they are clever by trying to force technology to do something unexpected. But that is always a recipe for failure. Even if you could get it to work with minimal side effects, the technology evolves, so the tricks will turn ugly.

Once you’ve matured as a programmer, you realize that clever is just asking for trouble, and usually for no good reason. Most code, most of the time, is pretty basic. At least 90% of it. Usually, it only needs to do the same things that people have been doing for at least half a century. The problem isn’t coming up with some crazy, clever new approach, but rather finding a very reasonable one in an overly short period of time, in a way that you can keep moving it forward over a long series of upgrades.

We build things, but they are not art; they are industrial-strength machinery. They need to be clean, strong, and withstand the ravages of the future. That is quality code; anything else is just a distraction.

Now, if you are pushing the state of the art for some reason, then you would have to go outside of the standard components and their usages. So, I wasn’t surprised that NoSQL came into existence, and I have had a few occasions where I both needed it and really appreciated it. ORMS are similar.

It’s just that I would not have been able to leverage these technologies properly if I didn’t already understand how to get the most out of an RDBMS. I needed to hit the limits first to gain an understanding.

So, when I saw a lot of people using NoSQL to skip learning about RDBMSes, I knew right away that it was a tragic mistake. They failed to understand that their usage was rather stock and just wanted to add cool new technologies to their resumes. That is the absolute worst reason to use any technology, ever. Or as I like to say, experiment on your own time, take your day job seriously.

In that sense, using an RDBMS for something weird is going against the grain, but skipping it for some other eclectic technology is also going against the grain. Two variations on the same problem. If you need to build something that is reliable, then you have to learn what reliable means and use that to make stronger decisions about which components to pile into the foundations. Maybe the best choice is old, and not great for your resume, but that is fine. Doing a good job is always more important.

This applies, of course, to all technologies, not just RDBMSes. My first instinct is to minimize using any external components, but if I have to, then I am looking for the good, reliable, industrial-strength options. Some super-cool, trendy, new component automatically makes me suspicious. Old, crusty, battle-scarred stuff may not look as sweet, but in most cases, it is usually a lot more reliable. And the main quality that I am looking for is reliability.

But even after you decide on the tech, you still have to find the grain and go with it. You pick some reasonable library, but then try to make it jump around in unreasonable ways; it will not end well. In the worst case, you incorrectly convince yourself that it is doing something you need, but it isn’t. Swamping out a big component at the last minute before a release is always a huge failure and tends to result in really painful circumstances. A hole that big could take years to recover from.

So, it plays back to the minimization. If we have to use a component, then we have to know how to use it properly, so it isn’t that much of a time saving, unless it is doing something sophisticated enough that learning all of that from scratch is way out of the time budget. If you just toss in components for a tiny fraction of their functionality, the code degenerates into a huge chaotic mess. You lose that connection to knowing what it will really do, and that is always fatal. Mystery code is not something you ever want to support; it will just burn time, and time is always in short supply.

In general, if you have to add a component, then you want to try to use all of its core features in a way that the authors expected you to use them. And you never want to have two contradictory components in the same system; that is really bad. Use it fully, use it properly, and get the most out of the effort it took to integrate it fully. That will keep things sane. Overall, beware of any components you rely on; they will not save you time; they may just minimize some of the learning you should have done, but they are never free.

Thursday, January 1, 2026

New Year

I was addicted from the moment I bought my first computer: a heavily used Apple ][+ clone. Computers hadn’t significantly altered our world yet, but I saw immense potential in that machine.

None of us, back in those days, could have predicted how much these machines would damage our world. We only saw the good.

And there has been lots of good; I can’t live without GPS in the car, and online shopping is often handy. I can communicate with all sorts of people I would not have been able to meet before. 

But there have also been a massive number of negative effects, from surveillance to endless lies and social divisions.

Tools are inherently neutral, so they have both good and bad uses; that is their nature. We have these incredibly powerful machines, but what have we done with them? The world is far more chaotic, way less fair, and highly polluted now. We could have used the machines to lift ourselves up, but instead we’ve let a dodgy minority use them to squeeze more money out of us. Stupid.

I’m hoping that we can turn a corner for 2026 and get back to leveraging these machines to make the world a better place. That we ignore those ambitious weasels who only care about monetizing everything and instead start to use software to really solve our rapidly growing set of nasty problems. Sure, it is not profitable, but who cares anymore? Having a lot of money while living on a burning planet isn’t really great. Less money on a happy planet is a big improvement.

The worst problem for the software industry is always trying to rush through the work, only solving redundant, trivial problems. We need to switch focus. Go slow, build up knowledge and sophistication, and ignore those people shouting at us to go faster. Good programming is slow. Slow is good. Take your time, concentrate on getting better code, and pay close attention to all of the little details. Programming is pedantic; we seem to have forgotten this.

The other thing is that we need to be far more careful about what software we write. Just say no to writing sleazy code. Not all code should exist. If they find someone else, that is not your problem, but doing questionable work because someone else might is a sad excuse. As more of us refuse, it will get a lot harder for them to achieve their goals. We can’t stop them, but at least we can slow them down a little.

The final thing to do is to forget about the twisted, messed-up history of software, at least for a moment. Think big, think grand. We have these powerhouse intellectual tools; we should be using them to lift humanity, not just for lame data entry. We need to build up stable, strong complexity that we can leverage to solve larger and larger problems. A rewrite of some crude approach with another crude approach just isn’t leveraging any of the capabilities of software. Rearranging the screens and using different widgets is only going sideways, not up. Software can remember the full context and help us make way better decisions. That is its power; we just need to start building things that truly leverage it.

Given the decreasing state of the world these days, it makes sense that we use this moment to shift focus. If computers got us into this mess, then they can also get us out of it. It’s been clear for quite a while that things are not going very well, but many of the people leveraging that sentiment are only doing so in order to make things worse. It's time we change that.