Thursday, February 19, 2026

Data Collection

One of the strongest abilities of any software is data collection. Computers are stupid, but they can remember things that are useful.

It’s not enough to just have some widgets display it on a screen. To collect data means that it has to be persisted for the long term. The data survives the programming being run and rerun, over and over again.

But it’s more than that. If you collect data that you don’t need, it is a waste of resources. If you don’t collect the data that you need, it is a bug. If you keep multiple copies of the same data, it is a mistake. The software is most useful when it always collects just what it needs to collect.

And it matters how you represent that data. Each individual piece of data needs to be properly decomposed. That is, if it is two different pieces of information, it needs to be collected into two separate data fields. You don’t want to over-decompose, and you don’t want to clump a bunch of things together.

Decomposition is key because it allows the data to be properly typed. You don’t want to collect an integer as a string; it could be misinterpreted. You don’t want a bunch of fields clumped together as unstructured text. Data in the wrong format opens up the door for it to be misinterpreted as information, causing bugs. You don’t want mystery data, each datam should have a self-describing label that is unambiguous. If you collect data that you can not interpret correctly, then you have not collected that information.

If you have the data format correct, then you can discard invalid junk as you are collecting it. Filling a database with junk is collecting data you don’t need, and if you did that instead of getting the data you did need, it is also a bug.

Datam are never independent. You need to collect data, and that data has a structure that binds together all of the underlying datam correctly. If you downgrade that structure, you have lost the information about it. If you put the data into a broader structure, you have opened up the possibility of it getting filled with junk data. For example, if the relationship between the data is a hierarchical tree, then the data needs to be collected as a tree; neither a list nor a graph is a valid collection.

In most software, most of the data is intertwined with other values. If you started with one specific piece of data, you should be able to quickly navigate to any of the others. That means that you have collected all of the structures and interconnections properly, and you have not lost any of them. There should only be one way to navigate, or you have collected redundant connections.

As such, if you have collected all of the data you need, then you can validate it. There won’t be data that is missing, there won’t be data that is junk. You can write simple validations that will ensure that the software is working properly, as expected. If the validations are difficult, then there is a problem with the data collection.

If you collect all of the data you need for the software correctly, then writing the code on top of it is way simpler and far easier to properly structure. The core software gets the data from persistence, then passes it out to some form of display. It may come back with some edits, which need to be updated in the persistence. There may be some data that you did not collect, but the data you did collect is enough to be able to derive it from a computation. There may be tricky technical issues that are necessary to support scaling, but those are independent from the collection and flow of data.

Collecting data is the foundation of almost all software. If you get it right, you will be able to grow the software to gradually cover larger parts of the problem domain. If you make a mess out of it, the code will get really ugly, and the software will be unreliable.

Thursday, February 12, 2026

Blockers

Some days the coding goes really smoothly. You know what you need; you lay out a draft version, which happens nicely. It kinda works. You pass over it a bunch of times to bang it properly into position. A quick last pass to enhance its readability for later, and then out the door it goes.

Sometimes, there is ‘friction’. You start coding, but you have to keep waiting on other things. So, it’s code a bit, set it aside, code a bit, etc. The delays can be small, but they add up and interfere with the concentration and sense of accomplishment.

Some friction comes from missing analysis. There was something you should have known, but it fell through the cracks. Some comes from interactions with others. You need something from your teammates, or you need it from some other external group.

With some issues for external groups, it will take lots of time to escalate it, arrange introductory meetings, get to the issue, and then finally come to a resolution. You can kinda fake the code a little in the meantime, but that is usually throw-away work, so you’d prefer to minimize it. If you are patient, it will eventually get done.

Occasionally, though, there is a ‘blocker’. It is unpassable. You started to work on something, but it was shut down. You are no longer able to work on it. It’s a dead end.

One type of blocker is that someone else is doing the same work. You were going to write something, but it turns out they got there first or have some type of priority. In some cases, that is fine, but sometimes you feel that you could have done a much better job at the effort, which is frustrating. Their code is limiting.

Another type is knowledge-based. You need something, but it is far too complex or time-consuming for others to let you write it.

Some code is straightforward. But some code requires buckets of very specific knowledge first, or the code will become a time sink. People might stop you from writing systems programming components like persistence, or domain-specific languages, or synchronization, for example. Often, that morphs into a buy-versus-build decision. So something similar exists; you feel you could do it yourself, but they purchase it instead, and the effort to integrate it is ugly. If you don’t already have that knowledge, you dodged a bullet, but if you do have it, it can be very frustrating to watch a lesser component get added into the mix when it could have been avoided with just a bit of time.

There are fear-based blockers as well. People get worried that doing something a particular way may just be another time sink, so they stop it quickly. That is often the justification for brute force style coding, for example. They’d rather run hard and pound it all out as a mess than step back and work through it in a smart way. In some shops, the only allowable code is glue, since they are terrified of turnover.

In that sense, blockers are usually about code. You have it, you need it, where is it going to come from? Are you allowed to write something or not? With knowledge, you can usually do the work to figure it out, or at least approximate it, but there could be some secret knowledge that you really need to move forward, but are fully blocked from getting it, although that is extremely rare.

If you flip that around, when you're building a medium-sized or larger system, the big issue is where is the code for it going to come from? In that sense, building software is the work of getting all of the code you need together in one organized place. Some of it exists already, some of it you have to create yourself.

In the past, the biggest concern about pre-existing code was always ‘support’. You don’t want to build on some complex component only to have it crumble on you, and there is nothing you can do about it. That is an expensive mistake. So, if you aren’t going to write it yourself, then who is going to support it, and how good is that support?

If you follow that, then you generally come to understand that as you build up all of this code, support is crucial. It’s not optional, and it is foolish to assume the code is bug-free and will always work as expected.

It’s why old programmers like to pound out a lot of stuff themselves; they know when doing that, they can support their own code, and they know that that doesn’t waver until they leave the project. The support issue is resolved.

It’s also why most wise programmers don’t just add in any old library. They’ve had issues with little dodgy libraries that were poorly supported in the past, so they have learned to avoid them. Big, necessary components are unavoidable, but the little odd ones are not. If you can’t find a legitimate version of something, doing it yourself is a much better choice.

Which brings us all of the way around to vibe coding. If you’ve been around a while, then nothing seems like a worse idea than having the ability to dynamically generate unsupported code. Tonnes of it.

Particularly if it is complex and somewhat unlimited in depth.

A whack load of boilerplate might be okay; at least you can read and modify it, although a debugger would still likely be necessary to highlight the problem, so it can mean a lot of work recreating the issue. So, it might only be a short-term time saver, but a nasty landmine waiting for later. Supportable, but costly.

But it would be heartbreaking to generate 100K in code, which is almost usable but entirely unsupportable. If you did it in a week, you’d probably just have to live with the flaws or spend years trying to pound out the bugs.

Not surprisingly, people tried this often in the past. They built sophisticated generators, hit the button and got full, ready-to-go applications. You don’t see any of these around anymore, since the support black holes they formed consumed them and everything else around them, so they essentially eradicated the evidence of their existence. It was tried, and it failed miserably.

But even more interesting was that those older application generators were at least deterministic. You could run them ten times, and mostly get back the same code. With vibe coding, each run is a random turkey shoot. You’ll get something different. So, extra unsupportable, and extra crazy.

If you are going to build a big system to solve a complex problem, then you need to avoid any and all blockers that get in your way. Friction can slow you down, but a blocker is often fatal.

These days, you’re not really ‘writing’ the system, so much as you are ‘assembling it’. If you do that from too many unsupportable subparts, then the whole will obviously be unsupportable. Inevitably, if you put something into a production environment, you either have to be prepared to support it somehow or move on to the next gig. But if too much unsupportable crud gets out there, that next gig may be even worse than the one that you tried to flee from.

Thursday, February 5, 2026

Systems Thinking

There are two main schools of thought in software development about how to build really big, complicated stuff.

The most prevalent one, these days, is that you gradually evolve the complexity over time. You start small and keep adding to it.

The other school is that you lay out a huge specification that would fully work through all of the complexity in advance, then build it.

In a sense, it is the difference between the way an entrepreneur might approach doing a startup versus how we build modern skyscrapers. Evolution versus Engineering.

I was working in a large company a while ago, and I stumbled on the fact that they had well over 3000 active systems that were covering dozens of lines of business and all of the internal departments. It had evolved this way over fifty years, and included lots of different tech stacks, as well as countless vendors. Viewed as ‘one’ thing it was a pretty shaky house of cards.

It’s not hard to see that if they had a few really big systems, then a great number of their problems would disappear. The inconsistencies between data, security, operations, quality, and access were huge across all of those disconnected projects. Some systems were up-to-date, some were ancient. Some worked well, some were barely functional. With way fewer systems, a lot of these self-inflicted problems would just go away.

It’s not that you could cut the combined complexity in half, but more likely that you could bring it down to at least one-tenth of what it is today, if not even better. It would function better, be more reliable, and would be far more resilient to change. It would likely cost far less and require fewer employees as well. All sorts of ugly problems that they have now would just not exist.

The core difference between the different schools really centers around how to deal with dependencies.

If you had thousands of little blobs of complexity that were all entirely independent, then getting finished is just a matter of banging out each one by itself until they are all completed. That’s the dream.

But in practice, very few things in a big ecosystem are actually independent. That’s the problem.

If you are going to evolve a system, then you ignore these dependencies. Sort them out afterwards, as the complexity grows. It’s faster, and you can get started right away.

If you were going to design a big system, then these dependencies dictate that design. You have to go through each one and understand them all right away. They change everything from the architecture all the way down to the idioms and style in the code.

But that means that all of the people working to build up this big system have to interact with each other. Coordinate and communicate. That is a lot of friction that management and the programmers don’t want. They tend to feel like it would all get done faster if they could just go off on their own. And it will, in the short-term.

If you ignore a dependency and try to fix it later, it will be more expensive. More time, more effort, more thinking. And it will require the same level of coordination that you tried to avoid initially. Slightly worse, in that the time pressures of doing it correctly generally give way to just getting it done quickly, which pumps up the overall artificial complexity. The more hacks you throw at it, the more hacks you will need to hold it together. It spirals out of control. You lose big in the long-term.

One of the big speed bumps preventing big up-front designs is a general lack of knowledge. Since the foundations like tech stacks, frameworks, and libraries are always changing rapidly these days, there are few accepted best practices, and most issues are incorrectly believed to be subjective. They’re not, of course, but it takes a lot of repeated experience to see that.

The career path of most application programmers is fairly short. In most enterprises, the majority have five years or less of real in-depth experience, and battle-scared twenty-year+ vets are rare. Mostly, these novices are struggling through early career experiences, not ready yet to deal with the unbounded, massive complexity present in a big design.

Also, the other side of it is that evolutionary projects are just more fun. I’ve preferred them. You’re not loaded down with all those messy dependencies. Way fewer meetings, so you can just get into the work and see how it goes. Endlessly arguing about fiddly details in a giant spec is draining, made worse if the experience around you is weak.

Evolutionary projects go very badly sometimes. The larger they grow, the more likely they will derail. And the fun gives way to really bad stress. That severe last-minute panic that comes from knowing that the code doesn't really work as it should, and probably never will. And the longer-term dissatisfaction of having done all that work to ultimately just contribute to the problem, not actually fix it.

Big up-front designs are often better from a stress perspective. A little slow to start and sometimes slow in the middle, they mostly smooth out the overall development process. You’ve got a lot of work to do, but you’ve also got enough time to do it correctly. So you grind through it, piece by piece, being as attentive to the details as possible. Along the way, you actively look for smarter approaches to compress the work. Reuse, for instance, can shave a ton of code off the table, cut down on testing, and provide stronger certainty that the code will do the right thing in production.

The fear that big projects will end up producing the wrong thing is often overstated. It’s true for a startup, but entirely untrue for some large business application for a market that’s been around forever. You don’t need to burn a lot of extra time, breaking the work up into tiny fragments, unless you really don’t have a clue what you are building. If you're replacing some other existing system, not only do you have a clue, you usually have a really solid long-term roadmap. Replace the original work and fix its deficiencies.

There should be some balanced path in the middle somewhere, but I haven’t stumbled across a formal version of it after all these decades.

We could go first to the dependencies, then come up with reasons why they can be temporarily ignored. You can evolve the next release, but still have a vague big design as a long-term plan. You can refactor the design as you come across new, unexpected dependencies. Change your mind, over and over again, to try to get the evolved works to converge on a solid grand design. Start fast, slow right down, speed up, slow down again, and so forth. The goal is one big giant system to rule them all, but it may just take a while to get there.

The other point is that the size of the iterations matters, a whole lot. If they are tiny, it is because you are blindly stumbling forward. If you are not blindly stumbling forward, they should be longer, as it is more effective. They don’t have to all be the same size. And you really should stop and take stock after each iteration. The faster people code, the more cleanup that is required. The longer you avoid cleaning it up, the worse it gets, on basically an exponential scale. If you run forward like crazy and never stop, the working environment will be such a swamp that it will all grind to an abrupt stop. This is true in building anything, or even cooking in a restaurant. Speed is a tradeoff.

Evolution is the way to avoid getting bogged down in engineering, but engineering is the way to ensure that the thing you build really does what it is supposed to do. Engineering is slow, but spinning way out of control is a heck of a lot slower. Evolution is obviously more dynamic, but it is also more chaotic, and you have to continually accept that you’ve gone down a bad path and need to backtrack. That is hard to admit sometimes. For most systems, there are parts that really need to be engineered, and parts that can just be allowed to evolve. The more random the evolutionary path, the more stuff you need to throw away and redo. Wobbling is always expensive. Nature gets away with this by having millions of species, but we really only have one development project, so it isn’t particularly convenient.