Friday, October 31, 2025

The Structure of Data

A single point of data -- one ‘variable’ -- isn’t really valuable.

If you have the same variable vibrating over time, then it might give you an indication of its future behavior. We like to call these ‘timeseries’.

You can clump together a bunch of similar variables into a ‘composite variable’. You can mix and match the types; it makes a nexus point within a bunch of arbitrary dimensions.

If you have a group of different types of variables, such as some identifying traits about a specific person, then you can zero in on uniquely identifying one instance of that group and track it over time. You have a ‘key’ to follow it, you know where it has been. You can have multiple different types of keys for the same thing, so long as they are not ambiguous.

You might want to build up a ‘set’ of different things that you are following. There is no real way to order them, but you’d like them to stay together, always. The more data you can bring together, the more value you have collected.

If you can differentiate for at least one given dimension, you can keep them all in an ordered ‘list’. Then you can pick out the first or last ones over the others.

Sometimes things pile up, with one thing about a few others. A few layers of that and we get a ‘tree’. It tends to be how we arrange ourselves socially, but it also works for breaking down categories into subcategories or combining them back up again.

Once in a while, it is a messy tree. The underlying subcategories don’t fit uniquely in one place. That is a ‘directed acrylic graph’ (dag) which also tends back to some optimizing forms of memoization.

When there is no hierarchical order to the whole thing it is just a ‘graph’. It’s a great way to collect things, but the flexibility means it can be dangerously expensive sometimes.

You can impose some flow, making the binary edges into directional ones. It’s a form of embedding traits into the structure itself.

But the limits of a single-dimensional edge may be too imposing, so you could allow edges that connect more than one entry, which is called a ‘hypergraph’. These are rare, but very powerful.

We sometimes use the term ‘entity’ to refer to our main composite variables. They relate to each other within the confines of these other structures, although we look at them slightly differently in terms of, say, 1-to-N relationships, where both sides are effectively wrapped in sets or lists. It forms some expressive composite structures.

You can loosely or tightly structure data as you collect it. Loose works if you are unsure about what you are collecting, it is flexible, but costly.. Tight tends to be considerably more defensive, less bugs, and better error handling.

It’s important not to collect garbage; it has no inherent value, and it causes painful ‘noise’ that makes it harder to understand the real data.

The first thing to do when writing any code is to figure out all of the entities needed and to make sure their structures are well understood. Know your data, or suffer greatly from the code spiraling out of control. Structure tends to get frozen far too quickly; just trying to duct tape over mistakes leads to massive friction and considerable wasted effort. If you misunderstood the structure, admit it and fix the structure first, then the code on top.

Monday, October 27, 2025

Fishbowl

I’ve often felt that software development projects were representative fishbowls for the rest of reality.

They have the same toxic mix that we see everywhere else, just on a smaller scale. It’s technology mixed with people mixed with business mixed with time.

Because of its turbulent history, technology is an ever-growing mess. It kinda works, but it's ugly and prone to glitches. We’ve spent decades desperately trying to avoid applying any serious engineering to our efforts.

People all have their own personal agendas that they often prioritize over the success of the collective work.

Some executives would prefer a low-quality early release so they can quickly claim success and move on. Software developers often pick inappropriate technologies to check off boxes on their resumes. All sorts of other players poke their fingers into the pie, hoping to make their careers.

People do what they think is best for themselves, which can be negative overall.

Meanwhile, they are circled by sales and business sharks hoping for a quick buck. They’ll promise anything if it will get their foot in the door. Lots of money is at stake. The software industry is slimy.

As the clock is ticking, too often the domain and engineering aspects fall to the ground. People stop caring about building solid tech that solves the user’s problem; they are more focused on their own issues.

This chaos devolves into producing a mess. Stuff tossed together too quickly for all the wrong reasons. The codebase turns to mud. It becomes a time vortex, with people desperately circling the edge, trying not to get sucked further in.

What usually works in software development is to return to the real underlying priorities. Ignore the surrounding circus. Keep producing reasonable code that goes as far as it can to really solve deep problems. It needs to be neat and consistent. A clean workspace avoids a lot of friction. All of the little details need attention. Programming involves a lot of patience.

If the codebase is solid, all the other problems remain at bay. If the codebase collapses, it opens the floodgates and lets the game get way out of control.

In that sense, it is all about controlling and minimizing the complexities. Fighting all the time to keep the artificial complexities from spawning, while making sure the inherent ones are satisfied.

Mastering this requires both a lot of knowledge and a lot of experience. It is a tricky juggling act. Keeping a million little balls in motion and off the ground while people scream at you about time.

That’s why people promising simple answers to these types of complex situations are inevitably destructive. They pick a few balls and improve those while the rest fall to the ground with a resounding thud. It seems to work right up until it inevitably collapses.

You can eliminate as much of the artificial complexity as possible, but never any of the inherent complexity. It remains and cannot be ignored. In software, you either have a reasonable codebase or you have a mess. This seems to be true elsewhere in meatspace as well.

Thursday, October 16, 2025

Patience

The biggest difference between now and when I started programming 35 years ago is patience.

Many of the people who commission software development projects are really impatient now.

The shift started with the dot-com era. There was a lot of hype about being the first into any given market. So, lots of people felt that it was better to go in early with very low quality, than to wait and produce something that was well refined.

That made sense then; a lot of those early markets were brand new, and many of the attempts to establish them were total failures. So, it doesn’t make a lot of sense to invest heavily in building a great piece of software if, in the end, nobody would want it anyway.

In the crater left behind, the industry shifted heavily to reactivity. Forget any sort of long-term planning or goals; just survive in the short term, throwing together whatever people say they want. That is a recipe to create a mess, but recreating that mess over and over again kept people busy.

Behind the scenes, we started sharing code a lot more. When I started coding, you had to write everything yourself. That took a long time, but if you were good, it also provided really great quality.

As more code became available, people would blindly throw in all sorts of stuff. It would bump up the functionality rapidly, but it also tended to bloat the code and leave a lot of dark corners in the codebase. They would wire up stuff that they barely understood, and it would seem to work for a while, only to end in tears.

Because of that, someone could toss together a quick demo that was really promising with a few neat features, without understanding that a real serious version of the same thing would require exponentially more effort. It started with websites, but quickly infected all software development. Fast-glued balls of mud became the de facto base for lots of systems, and they scale really poorly.

As the web dominated even more, since there were so many available components, and documentation never really matured, Q&A sites emerged. If you're rushing through a piece of work, with impatient people screaming at you, you can jump online, grab some example code, and slap it in. It just amplified the problems.

Mobile phones compounded the effect. An endless stream of noise made it hard to think deeply about anything. But shallow knowledge is effectively low-quality knowledge. You might know how to combine a bunch of things together, but when it doesn’t work as expected, there is very little you can do about it, except try again.

There are all sorts of trends about scaling software, and people get sucked into believing that it should be easy, but the first major failure point is the ability of people to deal with a big, ugly, messy, poorly constructed codebase. You will never get any sort of effective or reasonable behavior out of a pile of stuff that you don’t understand. Scaling requires deep knowledge, but impatience prevents us from acquiring that.

So I find it frustrating now. People run around making great claims about their software, but most of it is ugly, bloated, and buggy. We’re an industry of prioritizing marketing over engineering.

My favorite jobs were decades ago, back in what was at least the golden age of programming for me. Long before the central requirement became “just whack it out, we’ll fix it later”. What you don’t understand is a bug; it just may not have manifested yet.

Thursday, October 9, 2025

Experimentation

There are two basic ways of writing software code: experimentation and visualization.

With experimentation, you add a bunch of lines of code, then run it to see if it worked. As it is rather unlikely to work the first time, you modify some of the code and rerun. You keep this up until you a) get all the code you need and b) it does what you expect it to do.

For visualization, you think about what the code needs to do first. Maybe you draw a few pictures, but really, the functionality of the code is in your head. You are “seeing” it in some way. Then, once you are sure that that is the code you need, you type it out line by line to be as close as you can to the way you imagined it. After you’ve fixed typos and syntactic problems, the code should behave in the way you intended.

Experimentation is where everyone starts when they learn programming. You just have to keep trying things and changing them until the code behaves in the way you want it to.

What’s important, though, is if the code does not work as expected, which is common, you dig a little to figure out why it didn’t work. Learn from failure. But some people will just keep making semi-random changes to the code, hoping to stumble on a working version.

That isn’t so bad where there are only a small number of permutations; you end up visiting most of them, but for bigger functionality, there can be a massive number of permutations, and in some cases, it can be infinite. If you are not learning from each failure, it could take an awfully long time before you stumble upon the right changes. By avoiding learning something from each failure, you cap your abilities to fairly small pieces of code.

Instead, the best approach is to hypothesize about what will happen each time before you run the code. When the code differs, and it mostly will, you use that difference as a reason to dig into what’s underneath. Little by little, you will build up a stronger understanding of what each line of code does, what they do in combination, and how you can better leverage them. Randomly changing things and ignoring the failures wastes a lot of time and misses the necessity for you to learn stuff.

Visualization comes later, once you’ve started to build up a strong internal model of what’s happening underneath. You don’t have to write code to see what happens; instead, you can decide what you want to happen and then just make the code do that. This opens the door to you not only for writing bigger things, but also being able to writing far more sophisticated things. A step closer to mastering coding.

Experimentation is still a necessity, though. Bad documentation, weak technologies, weird behaviours; modern software is a mess and getting a little worse each year. As long as we keep rushing through the construction, we’ll never get a strong, stable foundation. We’re so often building on quicksand these days.

Thursday, October 2, 2025

The Value of Thought

You can randomly issue millions of instructions to a computer.

It is possible that when they are executed, good things will happen, but the odds of that are infinitesimally small.

If you need a computer to do anything that is beyond trivial, then you will need a lot of carefully constructed instructions to make it succeed.

You could try to iterate your way into getting these instructions by experimentation, using trial and error. For all of the earlier iterations just before the final successful one, though, some amount of the included instructions will essentially be random, so as initially stated, the odds that you blunder into the right instructions are tiny.

Instead, even if you are doing some experimentation, you are doing that to build up an internal understanding of how the instructions relate back to the behaviors of the computer. You are building a mental model of how those instructions work.

To be good at programming, you end up having to be good at acquiring this knowledge and using it to quickly build up models. You have to think very carefully about what you are seeing, how it behaves, and what you’d prefer it to have done instead.

These thoughts allow you to build up an understanding that is then manifested as code, which are the instructions given to the computer.

Which is to say that ‘coding’ isn’t the effort, thinking is. Coding is the output from acquiring an understanding of the problem and a possible solution to it. The software is only as good as the thoughts put into it.

If you approach the work too shallowly, then the software will not fit all of the expected behaviours. If the problems to be solved are deep and complex, then the knowledge needed to craft a good solution will also be deep and complex.

We see and acknowledge the value of the existing code, essentially as a form of intellectual property, but we are not properly valuing the knowledge, skills, time, and deep thinking that are necessary to have created such code. Software is only as good as the understanding of the programmers who created it. If they are clueless, the software is close to random. If they only understand a little part of what they are doing, the missing knowledge is getting randomized.

The quality of software is the quality of the thoughts put into it by everyone who contributed to it. If the thinking diminishes over time due to turnover, the quality will follow suit. If the original authors lack the abilities or understanding, the quality will follow suit.

So we can effectively mark out zero quality as being any set of random permutations that maximizes the incorrect behaviors, or bugs, as we like to call them.

But we can also go the other way and say that a very small set of permutations that makes reasonable behavioral tradeoffs while converging very close to zero deficiencies (both in the code itself and in its behavior) is the highest achievable quality. You can only achieve high quality if you’ve taken the time to really understand each and every aspect of what behavior is necessary. The understanding of the authors would have to be nearly full and complete, with no blind spots. That is a huge amount of knowledge, which takes a long time to acquire, and needs a group of people to hold and apply, which is why we don’t see software at that high quality level very often.

We value artwork correctly, though. A particular gifted artist’s work is not the value of the canvas, the frame, and the pigments applied. It is all that went into the artist's life that drove them to express their feelings into a particular painting. The Mona Lisa is a small canvas, but has great value, well beyond its physical presence.

Code is the same way. A talented and super knowledgeable group of people can come together to craft something deep and extremely useful. Its usefulness and value go far beyond the code; it comes from the thoughts that were built up in order to bring it into existence.

When that is forgotten, people stop trying to think deeply, and the quality plummets as a direct result. Thought is valuable, code is just proof that it happened.