The Programmer's Paradox: October 2020

Sunday, October 11, 2020

Programming

The point of programming is not to issue a fixed set of static instructions to the computer. Typing in, or using the mouse, to execute a series of specific instructions is ‘using the computer’, not ‘programming’ it.

Programming is when you assemble a set of instructions that has at least one or more variables that can change for each execution. That is, it is at least one step ‘higher’ than just using the machine.

If there is some ‘part’ of the instructions that can vary, then you can reuse those instructions in many different circumstances, each time specifying a new set of ‘values’ for any variables.

So, you can visualize a program as being a ‘forest’ of different possible instruction executions, one of which you might choose to set in motion.

It’s worth noting that computers are ‘discrete’; that there is always a fixed boundary for each and every variable, even if the number of possible permutations is massive.

So, we can talk about the ‘size’ of the program’s forest as a ‘space’ of all possible executions. If you have one integer variable in your program, then there are [min-integer..max-integer] number of different outcomes. If you have a number of dependent variables, then the size of the outcomes is multiplicative. If they are independent variables, then it is additive.

Quite obviously, if you have a lot of variables, the number of possible permutations is unbelievably large, bigger than we can conceptualize. But we can reason abstractly about many aspects of these possible spaces.

A key understanding is to be able to see that a single user also has some variance. Their tasks vary each time. They might want to use the system for their own constrained forest of work. In that case, it would not make much sense to write 2 or more programs to satisfy their requirements, particularly since each execution is similar to the others. One program that varies in the same way that they vary is a good, tight fit.

We can see that this gets a little more complicated when we consider a larger group of users. If they slightly expand the forest, then we can still craft the same code to satisfy all of them with one program. If however, they cluster into different independent sets themselves, then it is tempting to write one program per set. But, limiting the variations like that is unlikely to not be the fastest way to complete ‘all’ of the code for ‘all’ of the users. Depending on the forests, it’s a question of whether adding new variables is more or less expensive than crafting somewhat redundant programs. If the forests are nicely aligned, then observing that alignment can provide a higher level means of encapsulating the variability. The obvious use of this is for generic tools like editors and spreadsheets. They cover billions of users, but they do so at the cost of segmenting a very specific subset of the work into a siloed program.

It’s also worth noting that with a group of users, the most likely scenario is that their forests are somewhat disjointed. That comes intuitively from the observation that if it wasn’t somewhat disjoint, they would effectively all be doing the same job, but for most occupations, the work is usually partitioned, deliberately.

While we can use size as a measure of quantity, it is a little easier to think in terms of ‘scope’. They are basically similar, but it’s better to talk about the whole scope of one’s work, and then relate that back to a series of programs, each of specific size that collectively covers the entire workflow. Looking at it that way introduces the second problem of programs that are siloed. They pull off some chunk of the workflow and focus exclusively on executing just those instructions. It’s different from the forests caused by similar users, in that it is usually mutually exclusive, there are very few overlaps.

In order for the users to achieve their higher-level objectives, they have to go from silo to silo, executing code with specific variables, then in between export the data out of one silo, and import it into another. It’s easy to view import/export as just being more code with variability, but that misses the distinction that it only exists because the workflow crosses multiple silos, so it is purely artificial complexity. If there were just one, larger, more complete program, there would be no need to export/import. Obviously, any effort the user spends to navigate between the silos is artificial as well.

If the scope of a given workflow crosses a lot of silos, it’s not hard to imagine that the artificial work can add up to more than the base work. Because of that, it often makes a lot more sense from a user efficiency standpoint to build a series of components that are easily interconnectable, and just use programs as entry-points to bind them together. Then the lighter programs can be tailored with different components to closely match the user sets. It is far better than building a number of big siloed programs.

It’s also worth noting here, that user variation tends to shift with time. They start with a limited number of workflows and generally grow or move around, which is often the underlying problem with scope creep.

If we can construct sets of instructions with variability that need to match a changing landscape of users, requirements, and even silos, then it would be best to control that effort to produce code with the widest possible scope, given the time requirements. Code with just a few variables handles a small scope, with more variables it can handle a much larger one. If it is somewhat abstract, it can handle even more. Ultimately the most optimal development carefully matches the scope of the code to the scope of the user workflows but leaves it all composable and assembled at the latest possible moment.

In summary, it’s not programming if it can’t vary. If it does vary, it should match what the users are doing, and if they vary a lot, the code should vary a lot as well. If there are a lot of users, moving around a lot, then there will be too many variables for just basic code, so it needs some other level of organization to break it down into manageable components. These are then seamlessly reassembled to match the user's behavior. If you step up from the variability -- deal with it in a more abstract perspective -- you can massively widen the scope, getting a much better fit for the users. Given a lot of users, moving around a lot, doing a wide variety of different tasks, you should be able to craft components that reduce the final amount of code by orders of magnitude, which is obviously a lot less time and resources to build, and the users will spend a lot less time with import/export and navigation as well.

Monday, October 5, 2020

The Good, the Bad, and the Ugly

Some people believe that the quality of code is subjective. That one person's awful code is another person’s beautiful code.

There are some sprinkles of truth buried in those beliefs, but it’s not enough to actually validate it.

The key point is that the author of any piece of code is often biased. Their code is always good, no matter how bad it really is. So, if we want a better assessment of ‘quality’ it has to be from the perspective of the later coders who end up working on it, once the original author has left.

If you can read code, then bad code is obvious. It takes a longer time than normal to work through its issues, to figure out what it is doing. So it’s easily a time problem. If the code should have taken a day to understand, but a week later you are still confused, then it is pretty safe to say that it is bad. Well, almost. Many programmers can’t ‘read’ code, they are functionally illiterate. They can write stuff, or copy and paste it from somewhere else, but it would take them a very long time to read and understand anyone else’s code, whether it was good or bad.

That’s what injects so much confusion into quantifying ‘quality’. If a programmer struggles for a week to understand some code, you can’t tell if it's the code or the programmer, or both, that is having the problem.

On top of this, there are stylistic issues, idioms, and abstractions. Encountering a new idiom in someone else’s code, for example, can really slow down the reader, particularly if they don’t recognize it as such. What might be weird and unnatural to one programmer might be a very common idiom to a different group of them.

Even with all of these issues, we can really think in terms of expected base time, for a programmer with the correct knowledge, that would be necessary for understanding. So we can talk about bad code as being way slower to read, okay code as being more or less readable, and good as code that is easily extendable.

There are a huge number of different ways to make code bad. It can be obfuscation, fragmented, stupidly clever, or just obscure its intent using all sorts of tricks. Bad formatting, rampant inconsistencies, and awful naming help a lot too.

There are way fewer versions that are okay. Still lots of different permutations, but it is far easier and more creative to write bad code. Okay-code is readable, and it isn’t onerous to make a bug fix.

There are a fairly small number of variations for good code. Primarily because the code has to be technically strong, but also map back to the business problems or implement a strong abstraction. If someone asks for an extension, and you find it really straightforward to make those changes, then you know that it is good.

There is such a thing as great code, but it is exceedingly rare. Usually, it is abstract, but in a way that lets people leverage its power for all sorts of unexpected usage, and it too is pretty easy to extend (if you understand the abstraction).

It’s worth noting that just because code is believed to be working in production, doesn’t make it good code. Most systems have at least hundreds of bugs that exist but haven’t been triggered yet, and rust is always eating away at weak constructs. Working code contributes to the system's current stability, but it can also freeze the ability to keep developing it and suddenly become unstable when the usage suddenly changes. It might just be a bomb waiting to go off at an inconvenient time.

So, we can’t really point to just one version of the code and say that that is ‘perfect’, but we can get a sense of quality and also an understanding that as it increases, there are considerably fewer possible variations. A small number of actual implementations is good, a larger number are okay and the rest are just bad, but may not cause immediate grief. Experienced, literate, programmers can tell the difference, so it is far less subjective than most people realize.