Monday, October 31, 2022

Knowable

Some people believe that the universe is intrinsically unknowable.

That is there will always be mysterious things happening around us that we won’t be able to explain. It’s a somewhat romantic perspective on the world.

Other people believe the universe operates by a series of nearly-strict formal rules, deep underneath.

This means that any and every behavior or event is explainable, it’s just that we either don’t have deep enough knowledge yet, or the frequency of occurrence is so tiny that we haven’t been able to imagine the event before it occurred.

From that view, It’s not that the universe is random, but rather that our encounters with it are probabilistic. Things that happen, that you didn’t expect to happen, only happen rarely. So any surprises are a lack of expectations, not some mysterious forces.

In that second perspective, if you view the world with objectivity, you can gradually build up a better sense of the types of things that can and will occur and in doing so that brings you a bit closer to understanding our surrounding reality. You become a realist, not hoping for the best, nor complaining about the worst. Just accepting things as they are.

If you see the universe that way, then all things of interest are on some trajectory. You can then make inferences about where they are going, but also understand that any sort of unusual collision may change their course. Then you can catalog any of the known possible collisions and their frequencies. Obviously, common collisions are part of your assumed trajectory, but less common ones deserve consideration too.

If you understand the allowable paths for any given trajectory, you can make arrangements to route through them to the desired result. That is, you proactively know where you are going, while reactively steering through the known and occasionally unexpected obstacles to get there.

It also makes it easier to understand what just happened. Look for some uncommon collision and reinterpret its frequency as needed. A bunch of trajectory shifts in unlikely places can interact to alter seemingly unrelated trajectories, the proverbial perfect storm.

A mysterious universe is more appealing to some people, but collectively we do seem to have already acquired enough knowledge to dispute it.

Our next great feat is to be able to leverage what we know, and will soon learn, to keep our own trajectory from spiraling down. We’ve mastered living in harsh climates but have not figured out how to prevent unintended side effects. We shape the world around us for our own convenience, but it is still destructive. If we can sort out these mistakes while picking better destinations, we can live up to our intellectual potential.

Thursday, October 27, 2022

Bugs

Most bugs in our code have yet to be discovered.

They exist, either as mistakes, misunderstandings, typoos, or cheats. But the circumstances that would bring them into the light, as a pending development priority, have just not occurred yet.

There was a popular movement a while back that insisted that a limited form of testing would magically ensure that the code was bug-free. There was another less popular movement that doing a tonne of external work could prove the ideas behind the code were right. Both of these approaches are predicated on the idea that all bugs can be removed.

Pretty much the best and only reasonable definition of a ‘bug’ is that they are some annoying behavior of software that bugs someone. That is an incredibly wide definition that would include bugs for things like ugly or awkward interfaces, mismatches in conventions, inconsistencies, under-simplifications, and over-simplifications.

The healthy way to see bugs is that software is and will never be “perfect”. At best, if you work on it for a long time you might converge closer to perfection, but really you’ll never get there in your lifetime. So, bugs are an inevitable fact of building software. They exist, there are lots of them, and whenever you get a chance you should correct them, even if they are small or currently unnoticed.

The quality of any code is the number of bugs and the readability of the code itself. That is, it is the short-term usage, and the long-term extensibility, of the stuff you are building. The work to remove bugs is an exponential curve. It is easy to remove easy bugs, but it is a crazy amount of work to find and correct the difficult ones. Some bugs are just out of reach of the time frame. You may or may not know about them, but you’d never get the time to fix them either way.

Because bugs happen, any sort of release/deployment process needs to deal with them. That is, there is a strict, slow, correct way to release the next upgrade, and there is a fast, dirty, somewhat risky way to just get a patch into the wild. You have to have both, but the default is to always take the longer, safer way first.

There are lots of things coders can do to prevent and catch bugs. Usually, they are ‘habit’ driven. That is, without knowing any better, some of the ways people are constructing code is far more likely to be buggy. But if they just made small changes to the way they were working, fewer bugs would happen. As well, there are lots of ways beyond independent ‘tests’ that would help detect and prevent bugs from getting out into the wild. We have to accept bugs, but we can also change our working habits to reduce their frequency.

Monday, October 24, 2022

Naming

Naming is hard. It is the beginning of all confusion and laziness in code, and it is usually where most bugs hide. If the code is littered with bad names, you should be very suspicious of it.

Oddly, the hardness of the problem is because people expect it to come easily. So they get frustrated when it doesn’t. Then they just cheat and use really bad names instead.

As people see other people’s badly named mistakes, they accidentally follow suit, so the conventions really tend to degrade with each generation. And since they are mistakes, it becomes even harder to understand why they were chosen, making it all even more confusing.

But naming is an explicit indication that you understand what you are doing, and that you think about it clearly. The two most important characteristics of good programmers: the mechanics are clear in your head (unique) and you understand what is happening (precise). With those two things, most names, most of the time, are fairly obvious. If you also understand the degree of generality you need, then you know how to properly lift the names as well.

How do you know if the names are bad?

Easy. Go verbally explain your code to another programmer.

If your names are self-explanatory, you will use them directly while you are talking. If each name needs a description, explanation, or some other help, then they are bad. More to the point, if what you are trying to do needs any overly complex explanations and those explanations cannot be directly abstracted away in one step from the work itself (e.g. objects, models, patterns, idioms, data structures, etc.), then the whole thing is messy and convoluted, and likely does not work in the way you expect it to.

Because naming is hard but fundamental, it is one of the core skills that you should work on developing very early in your career. If you wait until you’ve learned other stuff first, your naming habits will be so poor that the stuff you produce won’t work even if you know how to do it properly.

Thursday, October 20, 2022

Operational Issues

If there is a problem with a system in “production”, rather obviously that should be recorded with all of the relevant facts and hopefully an ability to reproduce it.

Then the problem should be triaged:
  • Is it an operational problem? Such as configuration or resources?
  • Is it a data administration problem? Bad or stale data?
  • Is it a development problem? Our code or a vendor’s code is incorrect?

There should be a run book that lists all old and new problems this way.

If a problem is operational, say we’ve run out of some type of resource, then 2 things need to occur:
  1. Bump up the resource, and restart the system
  2. Analyze the problem and make recommendations to avoid it in the future.

If the problem is data administration, then there are a few options:
  • Was the data changed recently? Who changed it and why?
  • Is it stale? Why did it fall out of sync?
  • Is there a collision between usage? Both old and new values are needed at the same time?
  • Who is responsible for changing the data? Who is responsible for making sure it is correct?

If it is a development problem, then:
  • Is there a workaround?
  • How important is it that this gets fixed?
  • When can we schedule the analysis, design, coding, and testing efforts necessary to change it?

In the case of resources such as operating system configuration parameters, the first time they occur for any system it will be a surprise, but they should be logged both against the system itself and the underlying tech stack, so that later even if it happens in a completely different system, the solution to correct it quickly is known and already vetted.

If it is a well-known tech stack issue, then operations can address it immediately, and only later let the developers know that it had occurred.

If the problem is infrequent or mysterious, then operations may ask for developer involvement in order to do a full investigation. If the problem is repeating, trivial, or obvious, then they should be able to handle it on their own. If a developer needs to get involved, then they often need full and complete access to production, so it is not something you want to occur very often and it is expensive.

For common and reoccurring problems, operations should be empowered to be able to handle them immediately, on their own. For any system, the default behavior should be to reboot immediately. If reboots aren’t “safe” then that needs to be corrected right away (before any other new work commences).

As well as reactively responding to any problems with the system, operations need to be proactive as well. They should set up their own tooling that fully monitors all of the systems they are running, and alerts them to limited resource issues long before they occur. Non-trivial usage issues come from the users.

Upgrades and bug fixes should be tested and queued up by development, but deployed by operations at their convenience. Since operations are actively monitoring all systems, they are best able to decide when any changes are made. Operations are also responsible for tracking any underlying infrastructural changes, assessing the potential impacts, and scheduling them, although they may have to consult with vendors or in-house development to get more information.

Friday, October 14, 2022

Data Engineering

The foundation of all software systems is persistent data.

That is, a big part of any solution provided by a software system is the ability to digitize events, inventories, and conversations. Then collect them together and make them reliably available in the digital realm.

Like anything else, these foundations have to be in place first, before you can build any other type of computation on top. Building on top of broken foundations is a waste of time and resources.

An organization only needs only one copy of any of the possible trillions of digital records. They need to know where it is located, and unless it is accessed frequently they don’t need speed. If it takes a few minutes to get to infrequent data, that is fine. In some cases, an hour might be acceptable as well.

Obviously, size, frequency, and usage are critical things that need to be understood.

For every different type of record, at least one person in the company needs to really understand it. Its structure, its usage, its frequency, and any special cases that will be seen during collection. If you just collect data blindly, the quality of that data will always be very poor. You get out of data, the effort you put into making sure it is right.

Often beside the primary copy of the record, there can be a lot of secondary copies getting created as well. Sometimes for performance, more often because of politics. That is fine if all of them are ‘read-only’, but it is a really, really bad mistake if people are updating those secondary copies.

You have to capture the data as it exists. You can’t flatten structural data without losing information. You often can’t inflate it back due to ambiguities. So, if you spend the effort to capture it into a primary database, it is always worth any time and effort to get the structure correct. Collecting a lot of broken databases is a waste of resources.

If you have just one primary copy of good-quality data, then building out any other computations and interfaces on top of it is straightforward. If you find that what you are building is instead extremely complex, then it is often because of the poor foundations. It is way, way more efficient to fix the foundations first, instead of just piling more problems on top. The house of cards only ever gets worse, it will not fix itself and the real problem is below.

So, the core of data engineering is to be able to craft models for the incoming data that correctly list out the size, frequency, usage, and structure of each different ‘type’ of record. All of them. Then the other part is to ensure that for any organization there is only one primary source and that any secondary sources are always read-only. If that is in place, crafting databases, ETLs, or any other sort of data piping will go well. With that as a foundation, building anything on top will be cheaper and far better quality.

Saturday, October 8, 2022

Dynamic Systems

A picture is worth a thousand words, as they like to say. But it is also frozen in time. It doesn’t change, or at least it shouldn’t. 


And it’s a great analogy to describe things being ‘static’. Basically, a photograph is a directed static capture of just one specific moment of our dynamic reality. For a lot of pictures, pretty much right afterward at least one little thing has changed, the picture is no longer reproducible. Reality has moved on, it will never be the same again.

Now it’s worth noting that a video is a long sequence of pictures played together so that we can see a larger set of changes. While it is still static externally, it isn’t a moment anymore, it is now a period of time.  So, when you view it, it appears to be dynamic, but you can keep replaying it over and over again, so it isn't really.

Things that are dynamic can constantly change. They can change across all possible dimensions. Obviously, if we were going to try to model that, it would be impossible. 

But we can make some of the dimensions dynamic, while leaving others to be static, such as what happens with a video. Then things are not ‘entirely’ dynamic, but there are still dynamic aspects that are captured, at least for some period of time.

All of that is fairly easy to understand, but it gets a little less so when applied to mapping aspects of reality onto software. Fundamentally, what we want to do is capture some information about the world around us with our computers. It’s just that it’s highly multi-dimensional, and many of those dimensions are intrinsically dynamic. 

We should stop and talk about the multi-dimensional aspect of digital information. We mostly perceive our world as four dimensions, so we could capture any point in time and space, and combine these together to get descriptions of any type of objects. We do have higher dimensional theories, but we tend to think of our world in terms of 4D.

In software though, we have a large number of ‘variables’; that is things that can and do vary. Each one only has a finite number of different possibilities and at any one time there are only ever a fixed number of them, but since we tend to use a crazy large number of variables, it is worth treating each and every one as though it was essentially its own dimension. We can boil that down to a subset of variables that are so ewhat independent of each other, but since that is still a fairly large number it doesn't really change our perspective. It is a huge number of dimensions.

A static program then is one where the number of dimensions and the spaces it occupies, cannot change. For example, you write a problem with exactly 3 integer variables and the result is thay are added together, then we know the entire input space (all triples permuted from integer min to max) and the output space (an integer from min to max) are both fixed. The entire possible behaviour is static. 

A dynamic program however can introduce new variables. 

The easiest way to do this is to keep a list of integers and not have a maximum list size. At some point the list will grow too large for the hardware, but we’ll ignore that for the moment. While each integer has a fixed space, the number of them is essentially ‘nearly’ infinite, which we will call very large. So, the overall space of the input if it were just a list to be added together is very large. If the output were another list, then it too would be very large. 

In that sense a single variable is one dimensional, but a list of them adds a new dimension. That’s not a surprise, but weirdly a matrix doesn’t add a second one, two lists or a spreadsheet do, and a dag seems to add more than one. That is, the expressive power of data structures is multi-dimensional, but rather than being fixed static dimensions like the variables, each dimension is a “very large” one. And it is that huge bump in expressive power that makes the whole system dynamic.

But that isn't the only way to make systems dynamic. Another way is to make the variables variable. That is, instead of only an integer, the value can be any valid primitive type in the language. But more fun is to let it also be any user constructed composite variables, e.g. structs or objects. It is fully polymorphic. Then the code doesn’t know or care about the type or structure of the data which means the code can be used for an essentially unlimited number of computations. So, rather than statically encode one block for each possible type of variable, you encode the block once for all types, primitive or composite or data structure. 

If that is the heart of the code, basically an engine, then you can apply that code to a huge set of problems, making the usage of the code itself dynamic.

So, we can make some of the data dimensions dynamic and we can make some of the code dynamic. And we can pass that attribute into any of the interfaces used to interact with the system. There can employ the same code and data techniques to make the interface dynamic.

Now that we can construct dynamic computations, we can map real world dynamic behaviours onto these, so that we model them correctly. As things swing around in the real world, we can capture that correctly in the model, and build systems around that. 

We can tell if something is mismatched in a system. 

If it was encoded statically, but it suffers from scope creep all of the time, that is a clear indication that there is at least one dynamic dimension missing. 

In fact, unless the coverage of the model never changes, either some dynamic attribute was missed or the interface is incomplete. But for that latter case, any "missing features” are easy to implement. However, if something was incorrectly made static however, the code and data changes are usually big, difficult, and/or expensive. 

So we can really test the suitability of any underlying model used in a software solution by examining the requested list of changes, classifying them as domain increases, incomplete interfaces, or invalid modelling. New domain territory needs to extend the code and data to a new part of the problem space, incomplete interfaces and invalid modelling were covered above.

There is also a sub-branch of mathematics called “dynamic programming” which is a bit confusing in that it predates computer programming, but its use of the term dynamic is similar to this discussion. You can encode some algebraic expressions that when applied will dynamically converge on an answer. I believe that you can encode both the code and the data for this statically, so we have yet another category, which is effectively a dynamic computation. It’s no doubt related to the mathematical concept of uncomputable, but that knowledge is too deep for this discussion, other than alluding to the notion that dynamic behaviour may be able to spring from seemingly static roots.

Some things in reality are static. They essentially will not change for the expected lifetime of the system (usually 30 years). Other things are intrinsically dynamic. They can’t be locked down. In order to build good systems that don’t have ugly side effects, we need to implement any dynamic behavior with dynamic data or code. Trying to cheat that with static implementations will generate unintentional and undesirable side-effects with both the construction and usage of the system.