The Programmer's Paradox: January 2025

Friday, January 24, 2025

Self Describing

One of the greatest problems with software is that it can easily be disconnected.

There is a lot of code for some projects or functionality, but people can’t figure out what it was trying to do. It’s just a big jumble of nearly random instructions.

The original programmers may have mostly understood what it does and how it works, but they may not have been able to communicate that information to everyone who may be interested in leveraging their work.

A big problem is cryptic naming.

The programmers pick acronyms or short versions for their names instead of spelling out the full words. Some acronyms are well-known or mostly obvious, but most are eclectic and vague. They mean something to the programmers, but not to anyone else. A name that only a few people understand is not a good name.

That notion that spelling everything out to be readable is a waste of time is un unfortunate myth of the industry. Even if it saves you a few minutes typing, it is likely to eat hours or days of somebody else’s time.

Another problem is the misuse of terminology.

There may have been a long-established meaning for some things, but the programmers weren’t fully aware of those definitions. Instead, they use the same words, but with a slight or significant change in the meaning. Basically, they are using the words wrong. Anyone with a history will be confused or annoyed by the inappropriate usage. That would lead other people astray.

Some programming cultures went the other way.

They end up spelling everything out in full excessive detail, and it is the excess length of the names that tends to make them easily misunderstood. They throw up a wall of stuff that obscures the parts underneath. We don’t need huge extensive essays on how the code works, just as we do need something extra information besides the code itself. Finding that balance is part of mastering programming.

Stuttering is a common symptom of severe naming problems. You’ll see parent/child relationships that have the exact same names. You never need to repeat the same string twice, but it has become rather too common to see that in code or file systems. For some technologies, it is too easy to stutter, but it's always a red flag that indicates that people didn’t take the time to avoid it. It makes you wonder what other shortcuts they took as well.

Ultimately a self-describing name is one that gives all of the necessary information that a qualified person needs to get an understanding or to utilize something. There is always a target audience, but it is usually far larger than most programmers are willing to admit.

If you put your code in front of another programmer and they don’t get it, or they make very invalid assumptions about what it does, it is likely a naming problem. You can’t get help from others if they don’t understand what you are trying to do, and due to its complexity, serious programming has evolved into needing teams of people to work on it rather than just individuals.

Modern-day programming is slogging through a rather ugly mess of weird syntax, inconsistencies, awkwardness, confusion, and bugs galore. People used to take the time to make sure their work was clean and consistent, but now most of it is just ugly and half-baked, an annoyance to use. Wherever possible, we should try to avoid creating or using bad technologies, they do not make the world a better place.

Friday, January 17, 2025

Complexity

Often, when people encounter intense complexity in their path, they choose to believe that an oversimplification, rather than the truth, is a better choice.

In software terms, they are asked to provide a solution for a really hard, complex, problem. The type of code that would choke most programmers. Instead, they avoid it and provide something that sort of, kind of, works a little closer to what was needed, but was never really suitable.

That’s a misfitting solution. It spawns all sorts of other problems as an explosion of unaddressed fragments. So they go into firefighting mode, trying to put out all of these secondary fires, but it only gets worse.

The mess and the attempt to bring it back under control can take far more time and effort than if they had just tackled the real problem. These types of “shortcuts” are usually far longer. They become a black hole sucking in all sorts of other stuff, spinning wildly out of control. Sometimes they never, ever, really work properly. The world is littered with such systems. Half-baked road hazards, just getting in people’s way.

Now it may be that the real solution would cross some sort of impenetrable boundary and be truly impossible, but more often it just takes a long time, a lot of concentration, and needs an abstraction or two to anchor it. If you carefully look at it from the right angle it is very tractable. You just have to spend the time looking for that viewpoint.

But instead of stepping back to think, people dive in. They just apply brute force, in obvious ways, trying to pound it all into working.

If it's a lot of complexity and you try to outrun it, you’ll end up with so much poor code that you’ll never really get any of it to work properly. If instead, you try to ignore it, it will return to haunt you in all sorts of other ways.

You need to understand it, then step up a level or two to find some higher ground that covers and encapsulates it. That code is abstract but workable.

It will be tough to implement, but as you see the bugs manifest, you rework the abstraction to correct the behavior. You’ll get far fewer bugs, but they will be far harder to solve. You can’t just toss on bandaids, they require deep refactoring each time. Still, once solved they won’t return or cascade, which ultimately makes it all a whole lot easier. It’s slower in the beginning but pays off.

The complexity of a solution has to match the complexity of the problem it is trying to solve. There is no easy way around this, you can’t just cheat and hope it works out. It won’t. It never has.

Saturday, January 4, 2025

Data Collection

There are lots of technologies available that will help companies avoid spending time organizing their data. They let them just dump it all together, then pick it apart later.

Mostly, that hasn’t worked very well. Either the mess renders the data lost in the swamp, or the resource usage is far too extreme to offset the costs.

But it also isn’t necessary.

The data that a company acquires is data that it specifically intends to collect. It’s about their products and services, customers, internal processes, etc. It isn’t random data that randomly appears.

Mining that data for knowledge might, at very low probabilities, offer some surprises, but likely the front line of the business alreadys knows these even if it isn’t getting communicated well.

Companies also know the ‘structure’ of the data in the wild. It might change periodically, or be ignored for a while, but direct observations can usually describe it accurately. Strong analysis saves time.

Companies collect data in order to run at larger scales. So, with a few exceptions, sifting through that data is not exploratory. It’s an attempt to get a reliable snapshot of the world at many different moments.

There are exploratory tasks for some industries too, but these are relatively small in scope, and they are generally about searching for unexpected patterns. But this means that first, you need to know the set of expected patterns. That step is often skipped.

Mostly, data isn’t exotic, it isn’t random, and it shouldn’t be a surprise. If there are a dozen different representations for it when it is collected, that is a mistake. Too often we get obsessed about technology but forget about its purpose. That is always an expensive mistake.