Thursday, April 23, 2026

Users

A common misconception in software development comes from not understanding users.

For any piece of software, there is a bunch of primary users who are using it to solve their problems. This ranges from commercial product usages all the way to large enterprise usages.

If the software is large and has been evolving for quite some time, these usages are often partitioned into different subgroups. Some use one feature set, others use a different one. Some users are tied directly to their group, and others will overlap between a bunch of groups.

There is usually a second set of users, whose primary tasks are system management, usually related to the data in the software, but sometimes configuration, access, or feature capabilities. Occasionally referred to as data administrators.

They most often sit outside of the primary users and do not use the software to solve those problems; they are just involved in making sure the software itself is capable of allowing the other users to get their work done.

Totally ignored, there is actually a third set of users. These users take the software, install it on the fundamental hardware, and interact with it when there are problems. Sometimes they don’t even know what the software does, but they are still responsible for providing the platform for the software to exist and fulfil its role. 

It used to be that this was classified as an Operations Department, but over the decades, there have often been a lot of Software Developers directly involved as well. These are users, too. They should have their own ops, test, or development accounts; they have complex access issues, and sometimes they have to be able to get into the software to determine that it is working or malfunctioning in a specific way, but most times should not be able to see what the data administrators can see.

Traditionally, people have not designated operations as users, which is a common mistake. They do “use” the software, and getting their work done is also dependent on it. It’s just that they are not focused on using the specific features or managing the data.

If you take a step back, then user requirements, and even user stories, should cover all three groups, not just the first or second one.

But it’s also true that if an enterprise has hundreds of software packages running, from an operations perspective, all of them share a large number of common requirements. They all need to be installed and upgraded, they all need to be monitored, and they all need some form of smoke tests for emergencies. An operations dept could put out a list of mandatory common user requirements long before any specific software project was a twinkle in someone’s eye.

What’s also true is that for the most part, these types of requirements do not change significantly with most tech stacks. The specifics may change a bit, but the base requirement is locked in stone.

This is a rather classic mistake that happens with new tech stacks or operations environments. Because they are new, everyone thinks they can start over again from scratch and ignore any previous round of complexities. However, once things get going, those same complexities come back to haunt the effort, and because they were ignored, they get handled extra poorly.

So, we see people put up software systems without any adequate monitoring, for example, and are surprised when the users complain about the system being down. Pushing the monitoring back onto the first and second user groups is common now, but it still makes the effort look rather amateur. Operations should be the first to know about a crash; they just can’t detect more subtle bugs buried deep under big features.

The users of software are anybody who interacts directly with that software, in any way. Non-users, while they still may be “stakeholders” in the effort, will never run the software, test it, log into it, or trigger some of the features.

Someone may be responsible for making sure the software project gets done on time, but if they do not interact with the software, they are not a user.

User requirements should have special priority above almost all other aspects of the work. They would only take a back seat when there are overriding cost or time constraints. But it really should be written down somewhere that the users did not get the feature or functionality they needed due to the enforcement of these constraints. At minimum, that builds up a wonderful list of future enhancements that should be considered as early in the effort as possible. The key point, though, is knowing what the user’s need in order to actually solve their problems is different than the specifics of a technical solution implementation.

Thursday, April 16, 2026

Strange Loops

I remember when I was in school; we got a difficult programming assignment that I struggled with. The algorithm, if I remember correctly, was to fill an oddly shaped virtual swimming pool at different levels, and then calculate the volume of water. The bottom of the pool was wavy curves.

The most natural way to write the code to simulate adding little bits of water at a time to the pool was with recursion. No doubt, you could unroll that into a loop and maybe a stack or two, but we were supposed to use recursion; it was the key to the assignment.

I had never encountered recursion before, and I found it mystifying. A function calls itself? That just seemed rather odd and crazy. Totally non-intuitive. I coded it up, but it never really worked properly. I got poor marks on that assignment.

Many months later, the light went on. It came out of nowhere, and suddenly, not only did recursion make sense, but it also seemed trivial. Over the decades, I’ve crafted some pretty intense recursive algorithms for all sorts of complex problems. It now comes intuitively, and I find it far easier to express the answer recursively than to unroll it.

Recursion is, I think, a simple version of what Douglas Hofstadter refers to in GEB as a ‘strange loop’; a paradox is a more advanced variety. For me, it is a conceptual understanding that sorts of acts like a cliff. When you are at the bottom, it makes no sense; it seems like a wall, but if you manage to climb up, it becomes obvious.

There are all sorts of strange loops in programming. Problems where the obvious first try is guaranteed to fail, yet there is another simpler way to encode the logic that always works.

A good example is parsing. The world is littered with what I call string-split parsers. They are the most intuitive way to decompose some complex data. You just start chopping the data into pieces, then you look at those pieces and react. For very simple data that works fine, but if you fall into programming languages, or pretty much anywhere where there are some inherent ambiguities, it will fail miserably.

But all you really need to do to climb this cliff is read the Green Dragon book. It gives you all of the practical knowledge you need to implement a parser, but also to understand any of the complicated parser generators, like Antlr or Yacc.

I guess the cool part is that once you have encountered and conquered a particular strange loop, that knowledge is so fundamental that it transcends tech stacks. If you can write a solid parser in C, you can write it in any language. If you understand how to write parsers, you can learn any programming language way faster. And nothing about that knowledge will radically change over the course of your career. You’ll jump around from different domains and stacks, but you always find that the same strange loops are waiting underneath.

A slight change over the decades was that more and more of the systems programming aspects of building software ended up in libraries or components. While that means that you’ll have fewer opportunities to implement strange loops yourself for actual production systems, you really still do need to understand how they work inside the components to leverage them properly. Being able to pound out a parser and an AST does help you understand some of the weirdness of SQL, for example. You intuitively get what a query plan is, and with a bit of understanding of relational algebra and indexing, how it is applied to the tables to satisfy your request. You’ll probably never have enough time in your life to write your own full database, but at least now you can leverage existing ones really well.

I’m not sure it’s technically a strange loop, but there is a group of mathematical solutions that I’ve often encountered that I refer to as ‘basis swaps’ that are similar. Essentially, you have a problem with respect to one basis, but in order to solve it, you need to find an isomorphism to some other basis, swap it, partially solve it there, then swap all or part of it back to the original basis. This happens in linear programming, exponential curve fitting, and it seemed to be the basis of Andrew Wiles solution for Fermat’s last theorem. But I’ve also played around with this for purely technical formal systems, such as device-independent document rendering. I guess ASTs and cross-compilers are there too, as are language-specific VMs like the JVM.

What I’ve seen in practice is that some programmers, when confronted with strange-loop type problems, go looking for shortcuts instead of diving in and trying to understand the actual problem. I do understand this. There is so much to learn in basic software development that you really don’t want to have to keep going off and reading huge textbooks. But I also know that trying to cheat a solution to a strange loop is a massive time waste. You’re always just another bug away from making it work correctly, but it will never work correctly. The best choice if you don’t have the time to do it properly is always not to do it. Instead, find someone else who knows or use something else where it already exists and is of pretty decent quality.

Mostly, there are very few strange loops in most applications programming, although there are knowledge swamps like schema normalization that cause similar grief. Once you stumble into system programming, even if it is just a simple cache, a wee bit of locking, or the necessity of transactional integrity, you run into all sorts of sticky problems that really do require existing knowledge to resolve them permanently.

Strange loops are worth learning. They don’t change with the trends or stacks, and they’ll enable you to be able to write, use, or leverage any of the software components or tools floating around out there. Sure, they slow you down a bit when you first encounter them, but if you bother to jump those hurdles, you’re lightning fast when you get older.

Thursday, April 9, 2026

The Quality Bars

For any given software development, there are a bunch of ‘bars’ that you have to jump over, which relate to quality.

At the bottom, there is a minimum quality bar. If the code behaves worse than this, the project will be immediately cancelled. Someone who is watching the money will write the whole thing off as incompetence. To survive, you need to do better.

A little higher is the acceptable quality bar. That is where both management and the users may not be happy about the quality, but the project will definitely keep going. It may face increased scrutiny, and there will probably be lots of drama.

Above that is the reasonable quality bar. The code does what it needs to, in the way people expect it to behave. There are bugs, of course, but none of them are particularly embarrassing. Most of them exist for a short time, then are corrected. The total number of the known long-term outstanding ones is one or two digits. There are probably several places in the code where people think “we should have ...”

Then we get into the good quality bar. Bugs are rare; there are very few regrets. People like using the code; it will stay around for a long, long time. Its weakness isn’t what’s already there; it is making sure future changes don’t negate that value.

There is a great quality bar too. The code is solid, dependable, and can be used as a backbone for all sorts of other stuff. It’s crafted with a level of sophistication that keeps making it useful even for surprising circumstances. People can browse the code and get an immediate sense of what it does and why it works so well.

Above that, there is an excellent quality bar, where the code literally has no known defects. It was meticulously crafted for a very clear purpose and is nearly guaranteed to do exactly that, and nothing but that. It’s the type of code that lives can depend on.

There is a theoretical ‘perfect’ quality bar, too, but it is unreachable. It’s asymptotic.

Getting to the next bar is usually at least 10x more work than getting to that lower bar; the scale is clearly exponential. If it costs 1 just to get to minimum, then it’s 10 for acceptable, and 100 for reasonable. Roughly. This occurs because the higher bars need people to continually revisit and review the effort and aggressively refactor it, over and over again. Code that you splat out in a few hours is usually just minimum quality. Maybe if you’ve written the same thing a few times already, you can start at a higher bar, but that’s unreliable. To get up to those really high bars means having more than one author; it has to be a group of people with an intense focus, all working in sync with up-to-date collective knowledge. Excellent code has a guarantee that it will not deviate from expectations, one that you can rely on, so it’s far more than just a few lines of code.

A great deal of the code out there in libraries and frameworks falls far short of being reasonable. You might not get affected by that, as it’s often code that is sitting idle in little-used features. Still, you have to see them as a landmine waiting to go off when someone tries to push the boundaries of its usage. Code that has been battle tested for decades can generally get near the good bar, but there is always a chance that some future version will fall way, way back.

The overall quality of a codebase is really its lowest bar. So if someone splats some junk into an excellent project, if it’s ever triggered, it can pull down everything else below acceptable. This is the Achilles heel of plugins, as a few poor ones getting popular can cause a lot of damage to perceived quality.

Thursday, April 2, 2026

Outlines

A software system is a finite resource.

For some people, this might be a surprising statement. They might feel that, as they can store a massive amount of data and talk to any other system in the world, this feels a lot more infinite.

At any given time, there is a specific quantity of hardware, wires, and electricity. If more of these resources are available than are currently being consumed, they are still finite. It’s space to grow, but still limited.

Anything that operates within this finite boundary is in itself finite. Sure, it is always changing, usually growing, but despite its massive and somewhat unimaginable size, it is still finite.

If, even in its immense size and complexity, all software is finite, then any one given system within this is also finite.

A software system has fixed boundaries. It does exactly one set of things. Parts of the code may be dynamic, so they have huge expressive capabilities, but there are still very fixed boundaries on exactly how large those are. It may be permutations greater than all of the particles in the known universe, but it still has a limit.

Time might appear to change that, but the period of time for which any given piece of software will be able to run is also finite. Someday it will come to an end. The hardware will disintegrate, the sun may supernova, or humanity may just blow itself up. More likely, though, that it will just get upgraded and essentially become something else.

Given all of this, any given software system in existence, or as imagined, has a very sharp boundary that defines it. In that sense, since it is composed of code and configurations, those precisely dictate what it can and cannot do.

You can go outside of this boundary and write tests that confirm 100% of these lines. It’s just that, given the ability to have dynamic code, it may take far, far longer to precisely test those behaviours than the lifetime of the system itself. Still, even though it is vague, due to time and complexity, the tests form an encapsulating boundary on the outside of the system.

The same is true for specifications. You could precisely specify any and all behaviours within the software system, but to get it entirely precise, the specifications would have to have a direct one-to-one correspondence with the lines of code and configurations. That would effectively make the specifications an Nth generation language that is directly compilable into a 3rd-generation one, or even lower. Because of this, some people equate precise specifications to the code itself, seeing the code as just a specification for the runtime instances of the software.

An exact specification is almost a proof of correctness. I suspect that proofs need a bit more in that they are driven by the larger context of the expectations. They are also generally only applied to algorithms, not the system as a whole.

So, all of this gives us a bunch of different ways to draw the outlines of a system.

On top of this, there are plenty of vague ways to draw or imagine them as well. We have requirements and user stories, as popular means. As well, one could perceive the system by its dual, which is the arrangement and flow of its data. You can more easily describe an inventory system, for example, by the data it holds, the way that data is interacted with, and how it flows from and to other systems. While the dual in this case is more abstract, it is also considerably simpler than specifying the functionality or the code.

Another way is to see the system by how people interact with it, essentially as a set of features that people use to solve their problems. If those features are effectively normalized, it too is a simpler representation, but if they have instead been arbitrarily evolved over years of releases, they probably have become convoluted.

One key point with all of these different outline types is that some are much better at describing certain parts of the expectations for the behaviour than others. You might need a proof of correctness for some tiny critical parts, but a rather vague outline is suitable for everything else.

No one representation fits everything perfectly, which even applies to the code. If the code was constructed over a long period of time, by people without strong habits for keeping it organized, it too has degenerated into spaghetti, and isn’t easily relatable to the other outlines. People may have changed their expectations and grown accustomed to the behaviours, but that still doesn’t make it the best possible representation for what they needed.

In practice, the best choice is often to half-do a bunch of different types of outlines, then set it all running and repair the obvious deficiencies. While this obviously doesn’t ensure high quality or rigorous mapping to the expectations, it is likely the cheapest form of creating complex software systems. It’s just that it is also extremely high-risk and prone to failure.

Thursday, March 26, 2026

Cogtastic

Since I started programming decades ago, there has been one seriously annoying trend that just does not seem to want to go away.

If you work for an enterprise, banging away at their internal systems, the management above you really, really, really wants you to just sit there, do your job, and not cause any problems. They want you to be an obedient little cog. Just a part of the machine that they direct to satisfy their goals.

The utter failure with that is that you are in the trenches. And to build even halfway decent stuff, you have to understand a huge amount about the technology and the domain. If you mix in some empathy for the actual users, then the results are usually pretty good. You’ve solved real problems for real people, and they are usually thankful.

But the managers sitting way up high on the hill are disconnected from all of that. They don’t care about users or technology; they care about budgets, politics, and promotions. That’s not a surprise; that is the world they are forced to live in.

But it is a fly in the ointment, since their games are entirely disconnected from the users' lives. And you’re caught in the middle.

In the best circumstances, management enables you to find an appropriate balance between all sides and still keep mostly to some crazy artificial schedule. They trust you, and they listen to your concerns. You’re not a cog, you’re a critical part of the construction process.

But there are very few higher-up managers who actually subscribe to that perspective. Most, instead, believe that they were anointed as the boss and that their will supercedes all other concerns. These are the people who want you to be a quiet, obedient cog. “Just shut up and do your job.”

From my direct experience, it has always been a disaster. It was what was failing in the 70s, 80s, 90s, and early turn of the century with software development. It wasn’t “Waterfall” that deviated the work from where it needed to be; it was the people in charge who mindlessly went off in the wrong direction. If they had been paying attention, they would be course correcting as needed, following whatever chaotic changes tumbled out of the fray. In those days when things failed, they failed at the top, but somehow it was the weight of the process that got blamed.

From where I have often sat, the “root problem” has and will always be a lack of understanding. The people driving and working on the development project get disconnected from the people who will ultimately use the output. If you don’t know what the user’s problem really is, how can you build any type of solution that will help them? You have to dig into that; it’s not optional.

The desire for the development teams to just be mindless cogs comes directly from that cogtastic viewpoint. There is some type of made-up schedule, but the programmers keep surfacing ignored or forgotten issues. If management indulges, then the schedule gets blown. Instead of blaming analysis or a lack of understanding, it is just easier to suppress the feedback and keep on going as planned, hoping that some good luck happens somewhere along the way.

I’m sure there are lots of examples out there of out-of-control projects getting saved at the last minute by some Wesley who manages to Save the Ship and thus avoid impending disaster. That’s great, but highly unreliable, and the people who do this never get any of the credit they deserve for avoiding the original doomed fate. It’s only when they are gone that people realize that they were quietly avoiding the cliffs, while staying nearly on course.

So, you get this situation where someone is “in charge,” and they want their will to be manifested as and how they decide, but they do not truly have the right objectives to achieve success.

This takes us back to AI. Instead of being some cool new tool to lift the quality of the work we're doing, it seeks to turn the developers themselves into those disconnected managers. So they generate whackloads of code which they don’t understand, and don’t care about, because they are trying to impress the higher-ups with their “velocity”. What is actually happening gets ignored, and what is really needed gets ignored, too. Now, instead of them being the cog, it is some AI agent who fills that role, and the whole cycle repeats.

The act of engineering some large complex system involves both understanding how it works and why it is necessary. Nothing can escape that. In the Waterfall days, people blamed the heavy processes, but making them super light and hugely reactive did nothing to fix the problems. Now, we’ll see the same effect play out again. The developers will just be agent managers, and the auto-cogs below them will pile up more of the wrong stuff. You still have the wrong stuff even if you get it faster, and there is a lot more of it to add to the mudball, which is worse.

The moral of the story is that people just don’t seem to be learning the same lesson that keeps rearing its ugly head over and over again. A vague notion of what you want is not enough to build it. The real work is in taking that loose idea and understanding it at a very deep level so that you can then actuate it into the large number of parts you need that all work together as expected.

The people in the middle who pull off this feat are not and will never be cogs. What’s in their heads, and what they know, is the essence of being able to pull this thing out of the ethos and into reality. Their job is to understand it all and then find a way to turn that understanding into physical bits. They are good at it if what is produced matches everyone’s expectations. The users, management, technologists, etc. There is this codebase that, when built and deployed, becomes a real solution to all of these problems. That codebase, and its boundaries as code, a specification, tests, or proofs, is just a manifestation of the software developers' actual understanding. If, in the middle of doing that, it clicks in that part of the ask is too ambiguous, way off the mark, impossible, or just crazy, it’s the developer's understanding that is the mission-critical part of success. Address that, and it probably works. Ignore that, and it definitely fails.