Wednesday, September 10, 2025

Manifestations

The only two things in a computer are code and data.

Code is a list of instructions for a computer to follow. Data is a symbolic encoding of bits that represents something else.

In the simplest of terms, code is a manifestation of what a programmer knew when they wrote it. It’s a slight over-simplification, but not too far off.

More precisely, some code and some configuration data come directly from a programmer’s understanding.

There could be generated code as well. But in an oddball sense, the code that generated that code was the manifestation, so it is still there.

Any data in the system that has not been ‘collected’ is configuration data. It was understood and placed there by someone.

These days, most code comes from underlying dependencies. Libraries, frameworks, other systems, and products. Interactions with these are glued into the code. The glue code is the author’s understanding, and the dependency code is the understanding of all of the other authors who worked on it.

Wherever and however we boil it down, it comes down to something that some person understood at some point. Code does not spontaneously generate. At least not yet.

The organization and quality of the code come directly from its author. If they are disorganized, the code is disorganized. If they are confused, the code is confused. If they were rushed, the code is weak. The code is what they understand and are able to assemble as instructions for the computer to follow.

Computers are essentially deterministic machines, but the output of code is not guaranteed to be deterministic. There are plenty of direct and indirect ways of injecting non-determinism into code. Determinism is a highly valuable property; you really want it in code, where possible, because it is the anchor property for nearly all users' expectations. If the author does not understand how to do this, the code will not be deterministic, and it is far too easy to make mistakes.

That code is so closely tied to the understandings of its authors that it has a lot of ramifications. The most obvious is that if you do not know something, you cannot write code to accomplish it. You can’t because you do not know what that code should be.

You can use code from someone else who knows, but if there are gaps in their knowledge or it doesn’t quite apply to your situation, you cannot really fix it. You don’t know how to fix it. You can patch over the bad circumstances that you’ve found, but if they are just a drop in a very large bucket, they will keep flowing.

As a consequence, the combined output from a large group of novice programmers will not exceed their individual abilities. It doesn’t matter how many participate; it is capped by understanding. They might be able to glue a bunch of stuff together, as learning how to glue things is a lesser skill than coding them, but all of the risks associated with those dependencies are still there and magnified by the lack of knowledge.

As mentioned earlier, a code generator is just a second level of indirection for the coding issues. It still traces back to people. Any code constructed by any automated process has the same problem, even if that process is sophisticated. Training an LLM to be a dynamic, but still automated, process does not escape this limitation. The knowledge that flowed into the code just comes from more sources, is highly non-deterministic, and rather obviously has even more risk. It’s the same as adding more novice programmers into the mix; it just amplifies the problems. Evidently, we are told that getting enough randomly typing monkeys on typewriters could generate Shakespeare, but that says nothing about the billions of monkeys you’ll need to do it, nor the effort to find that elusive needle in a rather massive haystack. It’s a tree falling in a forest with no one around.

For decades, there have been endless silver bullets launched in an attempt to separate code and configuration data away from the people who need to understand it. As Frederick P. Brooks pointed out in the 1970s, it is not possible. Someone has to issue the instructions, and they cannot do that if they don’t understand them. The work in building software is acquiring that understanding; the code is just the manifestation of that effort. If you don’t do the work, you will not get the software. If you get rid of the people who did the work, you will not be able to continue the work.

No comments:

Post a Comment

Thanks for the Feedback!