Thursday, December 29, 2022

Et cetera

Recently, I started queuing up posts for Thursdays at 9pm. Not really sure why, just seems like a good time to publish.

I figured there would be near zero readership over the holidays, people have much better things to do. So, initially, I was going to skip today, but I guess I’ll just ramble a little bit about the future anyways.

We’re at a strange juxtaposition in our history, where our overall trajectory doesn’t look great, but there are still a few things “in the wind” that could be epic game changers.

The biggest is that fusion is looking promising again. Unlimited clean energy would massively disrupt our societies. Lots of science fiction books out there explain why, but the shortest answer is that shared, expensive, infrastructure forces us all together. Break that, and give everybody dependable robots, and we’ll disperse everywhere. Maybe not so great for the planet, but it would definitely make it harder for people to manipulate and exploit many of us.

Another great pending innovation is self-driving vehicles. Forget cars, we could build mobile houses, which would be amazing. It’s like Howl's Moving Castle, but better. Your whole house is a mobile set of rooms that reconnect at different times, in different places. You could share parts of your home with your family or friends. Everything would be constantly mobile, particularly if cheap energy wasn’t a constraint.

In general, robotics are radically improving. They are going to cause massive class upheavals in our societies. Machines caused some trouble long ago, making it harder in some parts of the world for people to find good jobs that provide lifetime stability. Robots will be far more transformative. If they wipe out every sort of blue-collar occupation, there will be a great swath of people who are left without any options. Enough people that it could be a real problem.

But robots also have the potential to do great things. I could see massive salvage yards where millions of robots, some huge, others tiny, take all of our trash and gradually recycle it back into its raw elements. Decreasing-sized robots that separate and chop up stuff, until it has become so small that it can provide near purity. Swarms of little nanorobots partition stuff into different particles. Trash farming. We can’t do this today, the electricity cost is massive, but if that suddenly fell …

Some of the growing ramifications of AI are looking interesting as well. While it still seems to be a little rigid right now, a bunch of these new technologies are also pretty darn amazing. I’m just hoping that they will help lift us up and not just enable the worst exploits. Too many people want to control others for the purpose of extracting wealth; it’s pointless, but they persist. We don’t need more billionaires, instead, we really need to come together to fix our mistakes. To accomplish this we need to be collectively smarter, so maybe AI will actually support that somehow.

As for software, it’s probably time to finally get really serious about it. AI may wipe out programming, but if it doesn’t we really need to do a much better job of engineering these massive systems that rule our lives. As well, we should be actively preventing unscrupulous people from using software for their own malicious purposes. We can’t just keep on quickly hacking out stuff and then walking away. It’s just contributing to the instability of the world.

I guess that’s it. The future world could be really comfortable, smart, and even luxurious, or it may be a horrific police state where everything you do is monitored. Either way, it is software and energy that helps enable these futures.

Thursday, December 22, 2022

Split Level

[opps, I had posted this early this month under a different name. Now it’s redundant like so much of our code. ]

The ‘problem’ space is the physical/digital issues and data that are facing a group of users for a given domain.

The ‘solution’ space is the set of all functionality embedded into a software system that can be used to address some of the issues in that problem space.

Keeping those two strictly separate from each other makes it far easier to design complex systems.

A ‘feature’ is something the users need to deal with one or more of their issues in the problem space. That feature then maps over to one or more parts of 'functionality' in the solution space.

It’s a bad idea to let any solution terminology infect the description of the problem specifications. It is usually a good idea to push a lot of the problem terminology into the solution design.

That is, a common feature in a lot of systems is the ability to manage a particular category of data. Which is often mapped to ‘new’, ‘edit’, and ‘delete’ GUI screens in the software. It would be a mistake to call “editing” that data a feature. It is not a feature, it is just one of the parts of the functionality needed to address “managing” that data. It would be good to be more precise about the naming of that data. So, it’s asymmetrical.

Now it may seem really pedantic to discuss it this way, but there are deep seeded reasons why it is better.

An obvious one is that just implementing part of the functionality is not actually implementing the whole feature. So, we see a lot of systems out there with ‘incomplete’ features and they are extremely frustrating to use. Partially solving any problems often just makes those problems worse.

Another good reason to do this is that the terminology and necessity in the problem space does not always map nicely into the solution space. In fact, usually, it is kind of horribly denormalized, if not entirely irrational. People want to work in a way that is comfortable for them, which is usually far more difficult to implement. Because of this, we often stumble into the problem of whether a ‘simplification’ of any kind is just effective for the programmer or whether it really benefits the user. It is rarely both. It’s a classic trade-off.

If you do some ‘analysis’, that should be purely in the problem space. If you do some ‘design’, then it quite obviously needs to be in the solution space. If they are kept distinct, then we have a way of not blurring the lines and injecting noise into the outputs. The users need to keep track of something happening in their business. That is their feature. If that actuates into a portal screen or a report or some other construction, that is part of the design of the solution.

One place that is super common to see a mess is in classic requirements. They are often a haphazard mix between features, implementations, designs, conventions, abstractions, etc. Pretty much anything from either space gets brutally mixed and matched and combined into a mess. This is the key reason why they don’t get cross-checked properly and validated early on. This leads to confusion, ambiguities, and tonnes of scope creep. It’s a bigger reason why a lot of systems derail than methodology. If the requirements are convoluted, then any attempts to clean them up without splitting out the spaces will fail too, and sometimes just make the confusion worse.

If you had a clean list of features and another clean list of functions, then you could just mechanically go through all of them and see which functions address which features. You can see if the feature is fully or partially addressed. You can see useless functions. You would also see which features are not addressed at all. That would let you double-check that the solution design will actually help to ensure it is a good fit.

A big problem with modern software development is that the definitions we use in our industry are vague, blurry, convoluted, and often deliberately scrambled.

Thursday, December 15, 2022

Determinism

When you go to release a piece of software, you really, really, really want to know what that software will do in the wild. That is a big part of the mastery of programming.

People often ask for code that does very specific things. You write exactly that code. Each and every time it runs it will correctly do what is asked. Its behavior in ‘all’ circumstances is what you expect it to be.

There will be bugs, of course.

Depending on your skill level, there is a relatively fixed amount of targeted testing that can achieve any particular desired quality level.

For instance, if you are a strong programmer and you need near-perfect quality, depending on the size of the codebase, the testing might only require a few months or a couple of quarters of intensive effort to confirm that it always works correctly.

Aside from bugs, there are plenty of other things in code that can affect whether or not its execution will match your expectations. We’ll use the word ‘determinism ‘to describe these.

Code is deterministic if it only ever does what you expect. It is non-deterministic if its behavior is unpredictable; you don’t know how it works or what it will do when it runs.

A fully deterministic piece of code takes some inputs, applies a reasonable computation on them, and then produces some outputs. This happens each and every time you run the code. It seems simple enough.

So, the first point in making code deterministic, is that you understand and get control of all of the inputs.

We learned over fifty years ago that global variables were a bad idea. If you look at a function that relies on a global variable then not only do you have to worry about the inputs, but you also have to have some deep understanding of all of the possible states of that global. If it’s a signed integer, for instance, it could be any value between the minimum and maximum integer values.

For one calculation that doesn’t seem that bad, but as the code is following an execution tree for a bunch of functions, you can’t assume that the integer hasn’t changed for some reason. It may have been zero at the start, but somewhere else deep down in the stack it flipped to 42. You don’t know if that is possible, or if it happened just by looking at a small number of those functions. You’d have to look at everything that touches that global, and figure out if it ever got triggered. It is bad for single-threaded processes, it is far worse for multi-threaded code.

This is to say that referencing a global variable in your function adds some non-determinism to the function itself. Maybe a little, maybe a lot. Now, you aren’t entirely sure what to expect anymore when it executes. It’s not a massive problem for a single integer, but for each such occurrence, it only gets worse.

So any globals, whether they are primitive values, objects, singletons, etc. start to add non-determinism to each and every bit of code that utilizes them. They infect the code.

Oddly the fix isn’t hard.

We just make sure that every input in any function is declared as part of the function parameters. Then we know, way at the top, that all of the underlying functions are working on the same instance of that global and that they are referencing the same value, each and every time.

In that sense, “except for the top”, using globally accessible ‘variables’ is degrading the determinism of the code. Constants and immutable globals are okay, it’s just the stuff that varies that is dangerous.

We can further explore this by understanding that there is always a greater context playing out for our code. We can refer to that ‘execution context’ as ‘state’. If the only thing in that state is fixed data, then there is only 1 overall global state, and everything below that is deterministic. If there are N variables, each of which has M1, M2, ... Mq possible values, then the number of possible states is the M1*M2*...*Mq permutations, which is a crazy large number. Too many for us to imagine, too many for us to craft correct expectations about the execution.

But programmers use globals all of the time because it is easier. You just declare stuff once in some file, then keep referencing it everywhere. Oddly, shoving those globals into the function args is only slightly more work, but people still love to cut this corner aggressively.

Along with globals, a long time ago we realized that using ‘goto’ statements caused similar problems. They allow the flow of the code to just jump out to strange and unpredictable places. It’s the same problem with flow, that globals are to state. You can’t look at a function and produce correct expectations; you can not fully “determine” what that code is going to do without a lot of jumping around, searching, and contemplating other parts of the codebase. The flow of control itself is non-deterministic.

It’s not that the goto operations are all bad, after all, they did return to languages in limited circumstances like setjmp/longjmp and later in try/catch statements. Just that reckless usage cranks up the non-determinism, which makes it increasingly impossible to figure out what this stuff does in the wild.

The try/catch idiom for example does sometimes lightly re-introduce the goto problem. Any usage tends to split the outbound flow from the code. If there is a stack of code, each of which is doing one or more try/catch handling, then each one adds a *2 possible branch to the way the code will execute. A couple of *2 branches are easy to understand, but throwing lots of them everywhere all over the execution stack will explode the permutations beyond anyone’s ability to imagine the behavior, and thus make everything non-deterministic again.

That’s why the best practice is to always restrict try/catch handling to be only right at the bottom or right at the top. Put it in the bottom if you can do something meaningful right there, otherwise, let it go right up to the top and get dealt with just once. If you use try/catch diligently it cuts down on your own flow mechanics, but if you don’t apply any discipline the whole thing becomes increasingly non-deterministic.

As well as globals and gotos, non-determinism can be introduced at the computational and architectural levels too. The problem space itself might hold some non-deterministic or ambiguous elements, which can’t be tackled by brute force. Examples of these types of problems are quite complex, but in general, we tend to capture them as issues like transactional integrity, the dirty write problem, CAP, consensus, etc. They occur because the code cannot ever statically make the right choice, so, you need to wrap the theoretical aspects in some deep complexity to either make it deterministic or at least to make it as nearly deterministic as possible.

Deliberately ensuring determinism in your code is part of mastering coding. Making it readable, future-proof, and accurate work estimations are also necessary. Throwing together some code that kinda works is easy. Throwing together a lot of code that always works is difficult.

Thursday, December 8, 2022

Definitions

Communication is vital for succeeding in any activity that requires the participation of more than one individual.

Everyone has to be on the same page, they have to coordinate their actions.

All forms of communication, including verbal and written, are essentially compressed data. That is, we use higher-level terminology to quickly discuss complex topics. Underneath, those terms rely on everyone expanding their own definitions.

Definitions have a denotation or connotation, essentially a meaning with a specific set of boundaries. But they also have colloquial meanings, which is a further set of attached understandings than can depend on context, territory, or culture.

Most definitions are victims of history. They’ve been banged up by past events. Once they have been bent out of shape, they start detracting from the communication. Because of this, most professional fields use their own strict definitions, which can be quite different from the common ones. It ensures that any communication is precise.

So for coordination, in the very best case, an entire group of people already know and understand the definitions of the terms that they are using to communicate. Then the information is passed around correctly.

Often at the heart of many conflicts is a misunderstanding of one or more of the underlying definitions. Two people will argue with each other because their definitions are not aligned. Clearing up those definitions will usually dissipate the issues.

Besides accidental disagreements, we also see people purposely mess with definitions. They stretch them in order to twist out their “logic” to press some personal agenda. It is a popular tactic in politics. It is fueled by intentionally vague definitions that are constantly in motion. By now many of these definitions are entirely nonsensical.

A leading indicator of any questionable argument is someone initially starting it by changing some of the known definitions. They may do this to clarify the terminology, which can be good, but they also may be trying to make it even cloudier, which is definitely not. It is difficult to discern between those two situations.

Usually, the difference is that people who want clarity things will tighten their definitions, so they’ll reduce the scope or boundaries. People who are attempting to misuse definitions will shift them around or weaken the definitions to include more stuff. They get less precise.

With a set of broken definitions, you can reconnect them together in any way you want to pretend to prove basically anything. If people don’t pick up the deception, they can be left believing that some falsehood is true. So, it’s a powerful weapon wielded by unsavory people.

This is why it is so important to be tight and consistent with definitions and to fully understand them. It’s why most introductory subject courses are basically just endless definitions. Definitions are the weak link in our ability to communicate. Bad ones can masquerade as primitives in “logic” but given they are broken, any derived conclusions are meaningless. To communicate or even extend our knowledge we need a strong base of definitions. It’s not pedantic, it is precise; there is a difference.

Thursday, December 1, 2022

Problem and Solution Spaces

The ‘problem’ space is the physical/digital issues and data that are facing a group of users for a given domain.

The ‘solution’ space is the set of all functionality embedded into a software system that can be used to address some of the issues in that problem space.

Keeping those two strictly separate from each other makes it far easier to design complex systems.

A ‘feature’ is something the users need to deal with one or more of their issues in the problem space. That feature then maps over to one or more parts of functionality in the solution space.

It’s a bad idea to let any solution terminology infect the description of the problem specifications. It is usually a good idea to push a lot of the problem terminology into the solution design.

That is, a common feature in a lot of systems is the ability to manage a particular category of data. Which is often mapped to ‘new’, ‘edit’, and ‘delete’ GUI screens in the software. It would be a mistake to call “editing” that data a feature. It is not a feature, it is just one of the parts of the functionality needed to address “managing” that data. It would be good to be more precise about the naming of that data. So, it’s asymmetrical.

Now it may seem overly pedantic to discuss it this way, but there are deep seeded reasons why it is better.

An obvious one is that just implementing part of the functionality is not actually implementing the whole feature. So, we see a lot of systems out there with ‘incomplete’ features and they are extremely frustrating to use. Partially solving any problems often just makes those problems worse.

Another good reason to do this is that the terminology and necessity in the problem space does not always map nicely into the solution space. In fact, usually, it is kind of horribly denormalized, if not entirely irrational. People want to work in a way that is comfortable for them, which is usually far more difficult to implement. Because of this, we often stumble into the problem of whether a ‘simplification’ of any kind is just effective for the programmer or whether it really benefits the user. It is rarely both. It’s a classic trade-off.

If you do some ‘analysis’, that should be purely in the problem space. If you do some ‘design’, then it quite obviously needs to be in the solution space. If they are kept distinct, then we have a way of not blurring the lines and injecting noise into the outputs. The users need to keep track of something happening in their business. That is their feature. If that actuates into a portal screen or a report or some other construction, that is part of the design of the solution.

One place that is super common to see a mess is in classic requirements. They are often a haphazard mix between features, implementations, designs, conventions, abstractions, etc. Pretty much anything from either space gets brutally mixed and matched and combined into a swamp. This is the key reason why they don’t get cross-checked properly and validated early on. This leads to confusion, ambiguities, and tonnes of scope creep. It’s a bigger reason why a lot of systems derail than methodology. If the requirements are convoluted, then any attempts to clean them up without splitting out the spaces will fail too, and sometimes just make the confusion worse.

If you had a clean list of features and another clean list of functions, then you could just mechanically go through all of them and see which functions address which features. You can see if the feature is fully or partially addressed. You can see useless functions. You would also see which features are not addressed at all. That would let you double-check that the solution design will actually help to ensure it is a good fit.

A big problem with modern software development is that the definitions we use in our industry are vague, blurry, convoluted, and often deliberately scrambled.